# Full text of "Claude Shannon's Miscellaneous Writings"

## See other formats

Claude Elwood Shannon Miscellaneous Writings Edited by N. J. A. Sloane Aaron D. Wyner Back in 1993, the late Aaron Wyner and I edited Claude Elwood Shannon's papers, and most of them appeared in a volume (Claude Elwood Shannon's Collected Papers) which was published by the IEEE Press. However, there were a number of items written by Shannon of lesser interest which we did not include (some declassified wartime memoranda, obscure AT&T Bell Labs memos, some mimeographed MIT lecture notes, etc.). These we put into a binder, held together by an Acco metal strip. We made half a dozen copies, and gave copies to the Library of Congress, the British Library, the Bell Laboratories Library, the MIT Library, to Claude Shannon himself, and to one or two other places. Over the years many people have asked me if it was possible to get access to this collection. I had now had this volume scanned and converted to pdf files. The total size of the files is about 450 megabytes. Neil J. A. Sloane, October 13, 2013 Mathematical Sciences Research Center, AT&T Bell Laboratories, Murray Hill, New Jersey 07974 CONTENTS File 1 : Front matter This volume contains the following items. Bracketed numbers refer to the bibliography. "The Use of the Lakatos-Hickman Relay in a Subscriber Sender," Memorandum MM 40-130-179, August 3, 1940, Bell Laboratories, 7 pp. + 8 figs. "A Study of the Deflection Mechanism and Some Results on Rate Finders," Report to National Defense Research Committee, Div. 7-311 -Ml, circa April, 1941,37 pp. + 15 figs. "A Height Data Smoothing Mechanism," Report to National Defense Research Committee, Div. 7-313.2-M1, Princeton Univ., May 26, 1941, 9 pp. + 9 figs. "Some Experimental Results on the Deflection Mechanism," Report to National Defense Research Committee, Div. 7-31 1-M1, June 26, 1941, 11 pp. "Criteria for Consistency and Uniqueness in Relay Circuits," Typescript, Sept. 8, 1941,5 pp. + 3 figs. (With W. Feller) "On the Integration of the Ballistic Equations on the Aberdeen Analyzer," Applied Mathematics Panel Report No. 28.1, National Defense Research Committee, July 15, 1943, 9 pp. "Two New Circuits for Alternate Pulse Counting," Typescript, May 29, 1944, Bell Laboratories, 2 pp. + 3 Figs. (Note that many of these files contain more than one document.) File 5: [5] File 7: [7] File 9: [9] File 11: [11] File 12: [12] File 16: [16] File 16: [19] File 16: File 21: File 21: File 24: File 26: File 27: File 30: File 31: File 31: File 31: File 31: File 36: File 36: File 46: File 46: File 46: [20] "Counting Up or Down With Pulse Counters," Typescript, May 31, 1944, Bell Laboratories, 1 p. + 1 fig. [21] (With B. M. Oliver) "Circuits for a P.C.M. Transmitter and Receiver," Memorandum MM 44-1 10-37, June 1, 1944, Bell Laboratories, 4 pp., 1 1 figs. [23] "Pulse Shape to Minimize Bandwidth With Nonoverlapping Pulses," Typescript, August 4, 1944, Bell Laboratories, 4 pp. [24] "A Mathematical Theory of Cryptography," Memorandum MM 45-1 10-02, Sept. 1, 1945, Bell Laboratories, 1 14 pp. + 25 figs. [26] "Mixed Statistical Determinate Systems," Typescript, Sept. 19, 1945, Bell Laboratories, 17 pp. [27] (With R. B. Blackman and H. W. Bode) "Data Smoothing and Prediction in Fire-Control Systems," Summary Technical Report, Div. 7, National Defense Research Committee, Vol. 1, Gunfire Control, Washington, DC, 1946, pp. 71-159 and 166-167. AD 200795. Also in National Military Establishment Research and Development Board, Report #13 MGC 12/1, August 15, 1948. Superseded by [51] and by R. B. Blackman, Linear Data-Smoothing and Prediction in Theory and Practice, Addison-Wesley, Reading, Mass., 1965. [30] (With C. L. Dolph) "The Transient Behavior of a Large Number of Four- Terminal Unilateral Linear Networks Connected in Tandem," Memorandum MM 46-1 10-49, April 10, 1946, Bell Laboratories, 34 pp. + 16 figs. [31] "Electronic Methods in Telephone Switching," Typescript, October 17, 1946, Bell Laboratories, 5 pp. + 1 fig. [32] "Some Generalizations of the Sampling Theorem," Typescript, March 4, 1948, 5 pp. + 1 fig. [34] "The Normal Ergodic Ensembles of Functions," Typescript, March 15, 1948, 5 PP- [35] "Systems Which Approach the Ideal as P/N — > «>," Typescript, March 15, 1948, 2 pp. [36] "Theorems on Statistical Sequences," Typescript, March 15, 1948, 8 pp. [45] "Significance and Application [of Communication Research]," Symposium on Communication Research, 11-13 October, 1948, Research and Development Board, Department of Defense, Washington, DC, pp. 14-23, 1948. [46] "Note on Certain Transcendental Numbers," Typescript, October 27, 1948, Bell Laboratories, 1 p. [47] "A Case of Efficient Coding for a Very Noisy Channel," Typescript, Nov. 18, 1948, Bell Laboratories, 2 pp. [48] "Note on Reversing a Discrete Markhoff Process," Typescript, Dec. 6 1948, Bell Laboratories, 2 pp. + 2 Figs. Pi n Fi ■ le 46: Fi le 59: Fi le 59: Fi le 59: Fi le 59: Fi le 78: Fi le 78: Fi le 78: Fi le 78: Fi le 78: File 104 [49] "Information Theory," Typescript of abstract of talk for American Statistical Society, 1949, 5 pp. [58] "Proof of an Integration Formula,'* Typescript, circa 1950, Bell Laboratories, 2 pp. [59] "A Digital Method of Transmitting Information," Typescript, no date, circa 1950, Bell Laboratories, 3 pp. [72] * 'Creative Thinking,' ' Typescript, March 20, 1952, Bell Laboratories, 10 pp. [74] (With E. F. Moore) "The Relay Circuit Analyzer,*' Memorandum MM 53-1400- 9, March 31, 1953, Bell Laboratories, 14 pp. + 4 figs. [77] "Throbac - Circuit Operation," Typescript, April 9, 1953, Bell Laboratories, 7 pp. [78] ' 'Tower of Hanoi,' ' Typescript, April 20, 1953, Bell Laboratories, 4 pp. [81] "Mathmanship or How to Give an Explicit Solution Without Actually Solving the Problem," Typescript, June 3, 1953, Bell Laboratories, 2 pp. [84] (With E. F. Moore) "The Relay Circuit Synthesizer," Memorandum MM 53- 140-52, November 30, 1953, Bell Laboratories, 22 pp. + 5 figs. [87] "Bounds on the Derivatives and Rise Time of a Band and Amplitude Limited Signal," Typescript, April 8, 1954, Bell Laboratories, 6 pp. + 1 Fig. [95] "Concavity of Transmission Rate as a Function of Input Probabilities," Memorandum MM 55-1 14-28, June 8, 1955, Bell Laboratories. [104] "Information Theory," Seminar Notes, Massachusetts Institute of Technology, 1956 and succeeding years. Contains the following sections: "A skeleton key to the information theory notes," 3 pp. "Bounds on the tails of martingales and related questions," 19 pp. "Some useful inequalities for distribution functions," 3 pp. "A lower bound on the tail of a distribution," 9 pp. "A combinatorial theorem," 1 p. "Some results on determinants," 3 pp. "Upper and lower bounds for powers of a matrix with non-negative elements," 3 pp. "The number of sequences of a given length," 3 pp. "Characteristic for a language with independent letters/' 4 pp. "The probability of error in optimal codes," 5 pp. "Zero error codes and the zero error capacity Co," 10 pp. "Lower bound for P e f for a completely connected channel with feedback," 1 p. "A lower bound for P € when R > C," 2 pp. "A lower bound for P e ," 2 pp. "Lower bound with one type of input and many types of output," 3 pp. "Application of 'sphere-packing' bounds to feedback case," 8 pp. "A result for the memory less feedback channel," 1 p. "Continuity of P e opt as a function of transition probabilities," 1 p. "Codes of a fixed composition," 1 p. "Relation of P e to p," 2 pp. "Bound on P e for random ode by simple threshold argument," 4 pp. "A bound on P € for a random code," 3 pp. "The Feinstein bound," 2 pp. "Relations between probability and minimum word separation," 4 pp. File 104 File 105 [105] File 105 [106] File 105 ■ [107] File 105 [108] File 105 [124] File 105 ; [127] "Inequalities for decodable codes," 3 pp. "Convexity of channel capacity as a function of transition probabilities," 1 pp. "A geometric interpretation of channel capacity," 6 pp, "Log moment generating function for the square of a Gaussian variate," 2 pp. "Upper bound on P e for Gaussian channel by expurgated random code," 2 pp. "Lower bound on P e in Gaussian channel by minimum distance argument," 2 pp, "The sphere packing bound for the Gaussian power limited channel," 4 pp. "The r-terminal channel," 7 pp. "Conditions for constant mutual information," 2 pp, "The central limit theorem with large deviations," 6 pp. "The Chemoff inequality," 2 pp. "Upper and lower bounds on the tails of distributions," 4 pp. "Asymptotic behavior of the distribution function," 5 pp. "Generalized Chebyeheff and Chernoff inequalities," I p. "Channels with side information at the transmitter," 13 pp. "Some miscellaneous results in coding theory," 15 pp. "Error probability bounds for noisy channels," 20 pp. "Reliable Machines from Unreliable Components," notes of five lectures, Massachusetts Institute of Technology, Spring 1956, 24 pp. "The Portfolio Problem, and How to Pay the Forecaster," lecture notes taken by W. W, Peterson, Massachusetts Institute of Technology, Spring, 1956, 8 pp. "Notes on Relation of Error Probability to Delay in a Noisy Channel," notes of a lecture, Massachusetts Institute of Technology, Aug. 30, 1956, 3 pp. "Notes on the Kelly Betting Theory of Noisy Information," notes of a lecture, Massachusetts Institute of Technology, Aug. 31, 1956, 2 pp. "The Fourth- Dimensional Twist, or a Modest Proposal in Aid of the American Driver in England," typescript, All Souls College, Oxford, Trinity term, 1978, 7 pp. + 8 figs. "A Rubric on Rubik Cubics," Typescript, circa 1982, 6 pp. Claude Elwood Shannon Miscellaneous Writings Edited by N. J. A. Sloane Aaron D. Wyner Mathematical Sciences Research Center, AT&T Bell Laboratories, Murray Hill, New Jersey 07974 Preface This volume contains all of Claude Elwood Shannon's writings that we did not include in his Collected Papers. * * Claude Elwood Shannon: Collected Papers, edited by N. J. A. Sloane and A. D. Wyner, IEEE Press, New York, 1993, xliv + 924 pp. ISBN 0-7803-0434-9. Contents Photograph of Claude Shannon at Bell Labs in May 1952. Caption: "In 1952, Claude E. Shannon of Bell Laboratories devised an experiment to illustrate the capabilities of telephone relays. Here, an electrical mouse finds its way unerringly through a maze, guided by information remembered in the kind of switching relays used in dial telephone systems. Experiments with the mouse helped stimulate Bell Laboratories researchers to think of new ways to use the logical powers of computers for operations other than numerical calculation." Photograph of Claude Shannon and Dave Hagelbarger at Bell Labs in March 1955. Caption: "Claude Shannon, the originator of Information Theory, at the board and Dave Hagelbarger work out some equations needed. Their current projects include work on automata-advanced type of computing machines which are able to perform various thought functions. Photograph of Claude Shannon taken in 1980's. Photographer unknown. Preface Bibliography of Claude Elwood Shannon. Comments such as "Included in Part B" refer to Parts A, B, C, D of the Collected Papers mentioned in the Preface. This volume contains the following items. Bracketed numbers refer to the bibliography. [5] 4 The Use of the Lakatos-Hickman Relay in a Subscriber Sender," Memorandum MM 40-130-179, August 3, 1940, Bell Laboratories, 7 pp. + 8 figs. [7] "A Study of the Deflection Mechanism and Some Results on Rate Finders," Report to National Defense Research Committee, Div. 7-31 1-M1, circa April, 1941,37 pp. + 15 figs. [9] "A Height Data Smoothing Mechanism," Report to National Defense Research Committee, Div. 7-313.2-M1, Princeton Univ., May 26, 1941, 9 pp. + 9 figs. [11] "Some Experimental Results on the Deflection Mechanism," Report to National Defense Research Committee, Div. 7-31 1 -Ml, June 26, 1941, 1 1 pp. [12] "Criteria for Consistency and Uniqueness in Relay Circuits," Typescript, Sept. 8, 1941,5 pp. + 3 figs. [16] (With W. Feller) "On the Integration of the Ballistic Equations on the Aberdeen Analyzer," Applied Mathematics Panel Report No. 28.1, National Defense Research Committee, July 15, 1943, 9 pp. [19] "Two New Circuits for Alternate Pulse Counting," Typescript, May 29, 1944, Bell Laboratories, 2 pp. + 3 Figs. -2- [20] "Counting Up or Down With Pulse Counters," Typescript, May 31, 1944, Bell Laboratories, 1 p. + 1 fig. [21] (With B. M. Oliver) "Circuits for a P.C.M. Transmitter and Receiver," Memorandum MM 44-1 10-37, June 1, 1944, Bell Laboratories, 4 pp., 1 1 figs. [23] "Pulse Shape to Minimize Bandwidth With Nonoverlapping Pulses," Typescript, August 4, 1944, Bell Laboratories, 4 pp. [24] "A Mathematical Theory of Cryptography," Memorandum MM 45-1 10-02, Sept. 1, 1945, Bell Laboratories, 1 14 pp. + 25 figs. [26] "Mixed Statistical Determinate Systems," Typescript, Sept. 19, 1945, Bell Laboratories, 17 pp. [27] (With R. B. Blackman and H. W. Bode) "Data Smoothing and Prediction in Fire-Control Systems," Summary Technical Report, Div. 7, National Defense Research Committee, Vol. 1, Gunfire Control, Washington, DC, 1946, pp. 71-159 and 166-167. AD 200795. Also in National Military Establishment Research and Development Board, Report #13 MGC 12/1, August 15, 1948. Superseded by [51] and by R. B. Blackman, Linear Data-Smoothing and Prediction in Theory and Practice, Addison- Wesley, Reading, Mass., 1965. [30] (With C. L. Dolph) "The Transient Behavior of a Large Number of Four- Terminal Unilateral Linear Networks Connected in Tandem," Memorandum MM 46-1 10-49, April 10, 1946, Bell Laboratories, 34 pp. + 16 figs. [31] "Electronic Methods in Telephone Switching," Typescript, October 17, 1946, Bell Laboratories, 5 pp. + 1 fig. [32] "Some Generalizations of the Sampling Theorem," Typescript, March 4, 1948, 5 pp. + 1 fig. [34] "The Normal Ergodic Ensembles of Functions," Typescript, March 15, 1948, 5 pp. [35] "Systems Which Approach the Ideal as P/N -> <»," Typescript, March 15, 1948, 2 pp. [36] "Theorems on Statistical Sequences," Typescript, March 15, 1948, 8 pp. [45] "Significance and Application [of Communication Research]," Symposium on Communication Research, 11-13 October, 1948, Research and Development Board, Department of Defense, Washington, DC, pp. 14-23, 1948. [46] "Note on Certain Transcendental Numbers," Typescript, October 27, 1948, Bell Laboratories, 1 p. [47] "A Case of Efficient Coding for a Very Noisy Channel," Typescript, Nov. 18, 1948, Bell Laboratories, 2 pp. [48] "Note on Reversing a Discrete Markhoff Process," Typescript, Dec. 6 1948, Bell Laboratories, 2 pp. + 2 Figs. -3- [49] "Information Theory," Typescript of abstract of talk for American Statistical Society, 1949, 5 pp. [58] "Proof of an Integration Formula," Typescript, circa 1950, Bell Laboratories, 2 pp. [59] "A Digital Method of Transmitting Information," Typescript, no date, circa 1950, Bell Laboratories, 3 pp. [72] ' 'Creative Thinking," Typescript, March 20, 1952, Bell Laboratories, 10 pp. [74] (With E. F. Moore) "The Relay Circuit Analyzer," Memorandum MM 53-1400- 9, March 31, 1953, Bell Laboratories, 14 pp. + 4 figs. [77] "Throbac - Circuit Operation," Typescript, April 9, 1953, Bell Laboratories, 7 pp. [78] "Tower of Hanoi," Typescript, April 20, 1953, Bell Laboratories, 4 pp. [81] "Mathmanship or How to Give an Explicit Solution Without Actually Solving the Problem," Typescript, June 3, 1953, Bell Laboratories, 2 pp. [84] (With E. F. Moore) "The Relay Circuit Synthesizer," Memorandum MM 53- 140-52, November 30, 1953, Bell Laboratories, 22 pp. + 5 figs. [87] "Bounds on the Derivatives and Rise Time of a Band and Amplitude Limited Signal," Typescript, April 8, 1954, Bell Laboratories, 6 pp. + 1 Fig. [95] "Concavity of Transmission Rate as a Function of Input Probabilities," Memorandum MM 55-1 14-28, June 8, 1955, Bell Laboratories. [104] "Information Theory," Seminar Notes, Massachusetts Institute of Technology, 1956 and succeeding years. Contains the following sections: "A skeleton key to the information theory notes," 3 pp. "Bounds on the tails of martingales and related questions," 19 pp. "Some useful inequalities for distribution functions," 3 pp. "A lower bound on the tail of a distribution," 9 pp. "A combinatorial theorem," 1 p. "Some results on determinants," 3 pp. "Upper and lower bounds for powers of a matrix with non-negative elements," 3 pp. "The number of sequences of a given length," 3 pp. "Characteristic for a language with independent letters," 4 pp. "The probability of error in optimal codes," 5 pp. "Zero error codes and the zero error capacity C ," 10 pp. "Lower bound for P e j for a completely connected channel with feedback," 1 p. "A lower bound for P e when R > C," 2 pp. "A lower bound for P e " 2 pp. "Lower bound with one type of input and many types of output," 3 pp. "Application of 'sphere-packing' bounds to feedback case," 8 pp. "A result for the memoryless feedback channel," 1 p. "Continuity of P e opt as a function of transition probabilities," 1 p. "Codes of a fixed composition," 1 p. "Relation of P e to p," 2 pp. "Bound on P e for random ode by simple threshold argument," 4 pp. "A bound on P e for a random code," 3 pp. "The Feinstein bound," 2 pp. "Relations between probability and minimum word separation," 4 pp. -4- "Inequalities for decodable codes," 3 pp. "Convexity of channel capacity as a function of transition probabilities," 1 pp. "A geometric interpretation of channel capacity," 6 pp. "Log moment generating function for the square of a Gaussian variate," 2 pp. "Upper bound on P e for Gaussian channel by expurgated random code," 2 pp. "Lower bound on P e in Gaussian channel by minimum distance argument," 2 pp. "The sphere packing bound for the Gaussian power limited channel," 4 pp. "The jT-terminal channel," 7 pp. "Conditions for constant mutual information," 2 pp. "The central limit theorem with large deviations," 6 pp. "The Chernoff inequality," 2 pp. "Upper and lower bounds on the tails of distributions," 4 pp. "Asymptotic behavior of the distribution function," 5 pp. "Generalized Chebycheff and Chernoff inequalities," 1 p. "Channels with side information at the transmitter," 13 pp. "Some miscellaneous results in coding theory," 15 pp. "Error probability bounds for noisy channels," 20 pp. [105] "Reliable Machines from Unreliable Components," notes of five lectures, Massachusetts Institute of Technology, Spring 1956, 24 pp. [106] "The Portfolio Problem, and How to Pay the Forecaster," lecture notes taken by W. W. Peterson, Massachusetts Institute of Technology, Spring, 1956, 8 pp. [107] "Notes on Relation of Error Probability to Delay in a Noisy Channel," notes of a lecture, Massachusetts Institute of Technology, Aug. 30, 1956, 3 pp. [108] "Notes on the Kelly Betting Theory of Noisy Information," notes of a lecture, Massachusetts Institute of Technology, Aug. 31, 1956, 2 pp. [124] "The Fourth-Dimensional Twist, or a Modest Proposal in Aid of the American Driver in England," typescript, All Souls College, Oxford, Trinity term, 1978, 7 pp. + 8 figs. [127] "A Rubric on Rubik Cubics," Typescript, circa 1982, 6 pp. Bibliography of Claude Elwood Shannon "A Symbolic Analysis of Relay and Switching Circuits," Transactions American Institute of Electrical Engineers, Vol. 57 (1938), pp. 713-723. (Received March 1, 1938.) Included in Part B. Letter to Vannevar Bush, Feb. 16, 1939. Printed in F.-W. Hagemeyer, Die Entstehung von Informationskonzepten in der Nachrichtentechnik: eine Fallstudie zur Theoriebildung in der Technik in Industrie- und Kriegsforschung [The Origin of Information Theory Concepts in Communication Technology: Case Study for Engineering Theory- Building in Industrial and Military Research], Doctoral Dissertation, Free Univ. Berlin, Nov. 8, 1979, 570 pp. Included in Part A. "An Algebra for Theoretical Genetics," Ph.D. Dissertation, Department of Mathematics, Massachusetts Institute of Technology, April 15, 1940, 69 pp. Included in Part C. "A Theorem on Color Coding," Memorandum 40-130-153, July 8, 1940, Bell Laboratories. Superseded by "A Theorem on Coloring the Lines of a Network. ' ' Not included. "The Use of the Lakatos-Hickman Relay in a Subscriber Sender," Memorandum MM 40-130-179, August 3, 1940, Bell Laboratories, 7 pp. "A Study of the Deflection Mechanism and Some Results on Rate Finders," Report to National Defense Research Committee, Div. 7-311- Ml, circa April, 1941, 37 pp. + 15 figs. Included in this volume. "Backlash in Overdamped Systems," Report to National Defense Research Committee, Princeton Univ., May 14, 1941, 6 pp. Abstract only included in Part B. "A Height Data Smoothing Mechanism," Report to National Defense Research Committee, Div. 7-313.2-M1, Princeton Univ., May 26, 1941, 9 pp. + 9 figs. Included in this volume. "The Theory of Linear Differential and Smoothing Operators," Report to National Defense Research Committee, Div. 7-3 13.1 -Ml, Princeton Univ., June 8, 1941, 1 1 pp. Not included. "Some Experimental Results on the Deflection Mechanism," Report to National Defense Research Committee, Div. 7-3 11 -Ml, June 26, 1941, 1 1 pp. Included in this volume. B. [12] "Criteria for Consistency and Uniqueness in Relay Circuits," Typescript, Sept. 8, 1941, 5 pp. + 3 figs. Included in this volume. [13] "The Theory and Design of Linear Differential Equation Machines," Report to the Services 20, Div. 7-31 1-M2, Jan. 1942, Bell Laboratories, 73 pp. + 30 figs. Included in Part B. [14] (With John Riordan) "The Number of Two-Terminal Series-Parallel Networks," Journal of Mathematics and Physics, Vol. 21 (August, 1942), pp. 83-93. Included in Part B. [15] "Analogue of the Vernam System for Continuous Time Series," Memorandum MM 43-110-44, May 10, 1943, Bell Laboratories, 4 pp. + 4 figs. Included in Part A. [16] (With W. Feller) "On the Integration of the Ballistic Equations on the Aberdeen Analyzer," Applied Mathematics Panel Report No. 28.1, National Defense Research Committee, July 15, 1943, 9 pp. Included in this volume. [17] "Pulse Code Modulation," Memorandum MM 43-110-43, December 1, 1943, Bell Laboratories. Not included. [18] "Feedback Systems with Periodic Loop Closure," Memorandum MM 44-1 10-32, March 16, 1944, Bell Laboratories. Not included. [19] "Two New Circuits for Alternate Pulse Counting," Typescript, May 29, 1944, Bell Laboratories, 2 pp. + 3 Figs. Included in this volume. [20] "Counting Up or Down With Pulse Counters," Typescript, May 31, 1944, Bell Laboratories, 1 p. + 1 fig. Included in this volume. [21] (With B. M. Oliver) "Circuits for a P.C.M. Transmitter and Receiver," Memorandum MM 44-1 10-37, June 1, 1944, Bell Laboratories, 4 pp., 1 1 figs. Included in this volume. [22] "The Best Detection of Pulses," Memorandum MM 44-1 10-28, June 22, 1944, Bell Laboratories, 3 pp. Included in Part A. [23] "Pulse Shape to Minimize Bandwidth With Nonoverlapping Pulses," Typescript, August 4, 1944, Bell Laboratories, 4 pp. Included in this volume. [24] "A Mathematical Theory of Cryptography," Memorandum MM 45- 110-02, Sept. 1, 1945, Bell Laboratories, 114 pp. + 25 figs. Superseded by the following paper. Included in this volume. [25] "Communication Theory of Secrecy Systems," Bell System Technical Journal, Vol. 28 (1949), pp. 656-715. "The material in this paper appeared originally in a confidential report 'A Mathematical Theory of Cryptography', dated Sept. 1, 1945, which has now been declassified." Included in Part A. -3- [26] "Mixed Statistical Determinate Systems," Typescript, Sept. 19, 1945, Bell Laboratories, 17 pp. Included in this volume. [27] (With R. B. Blackman and H. W. Bode) "Data Smoothing and Prediction in Fire-Control Systems," Summary Technical Report, Div. 7, National Defense Research Committee, Vol. 1 , Gunfire Control, Washington, DC, 1946, pp. 71-159 and 166-167. AD 200795. Also in National Military Establishment Research and Development Board, Report #13 MGC 12/1, August 15, 1948. Superseded by [51] and by R. B. Blackman, Linear Data-Smoothing and Prediction in Theory and Practice, Addison-Wesley, Reading, Mass., 1965. Included in this volume. [28] (With B. M. Oliver) "Communication System Employing Pulse Code Modulation," Patent 2,801,281. Filed Feb. 21, 1946, granted July 30, 1957. Not included. [29] (With B. D. Holbrook) "A Sender Circuit For Panel or Crossbar Telephone Systems," Patent application circa 1946, application dropped April 13, 1948. Not included. [30] (With C. L. Dolph) "The Transient Behavior of a Large Number of Four-Terminal Unilateral Linear Networks Connected in Tandem," Memorandum MM 46-110-49, April 10, 1946, Bell Laboratories, 34 pp. + 16 figs. Included in this volume. [31] "Electronic Methods in Telephone Switching," Typescript, October 17, 1946, Bell Laboratories, 5 pp. + 1 fig. Included in this volume. [32] "Some Generalizations of the Sampling Theorem," Typescript, March 4, 1948, 5 pp. + 1 fig. Included in this volume. [33] (With J. R. Pierce and J. W. Tukey) "Cathode-Ray Device," Patent 2,576,040. Filed March 10, 1948, granted Nov. 20, 1951. Not included. [34] "The Normal Ergodic Ensembles of Functions," Typescript, March 15, 1948, 5 pp. Included in this volume. [35] "Systems Which Approach the Ideal as P/N -> oo," Typescript, March 15, 1948, 2 pp. Included in this volume. [36] "Theorems on Statistical Sequences," Typescript, March 15, 1948, 8 pp. Included in this volume. [37] "A Mathematical Theory of Communication," Bell System Technical Journal, Vol. 27 (July and October 1948), pp. 379-423 and 623-656. Reprinted in D. Slepian, editor, Key Papers in the Development of Information Theory, IEEE Press, NY, 1974. Included in Part A. [38] (With Warren Weaver) The Mathematical Theory of Communication, University of Illinois Press, Urbana, JL, 1949, vi + 1 17 pp. Reprinted (and repaginated) 1963. The section by Shannon is essentially identical to the previous item. Not included. [39] (With Warren Weaver) Mathematische Grundlagen der Informationstheorie, Scientia Nova, Oldenbourg Verlag, Munich, 1976, pp. 143. German translation of the preceding book. Not included. [40] (With B. M. Oliver and J. R. Pierce) "The Philosophy of PCM," Proceedings Institute of Radio Engineers, Vol. 36 (1948), pp. 1324- 1331. (Received May 24, 1948.) Included in Part A. [41] "Samples of Statistical English," Typescript, June 11, 1948, Bell Laboratories, 3 pp. Included in this volume. [42] "Network Rings," Typescript, June 11, 1948, Bell Laboratories, 26 pp. + 4 figs. Included in Part B. [43] "Communication in the Presence of Noise," Proceedings Institute of Radio Engineers, Vol. 37 (1949), pp. 10-21. (Received July 23, 1940 [1948?].) Reprinted in D. Slepian, editor, Key Papers in the Development of Information Theory, IEEE Press, NY, 1974. Reprinted in Proceedings Institute of Electrical and Electronic Engineers, Vol. 72 (1984), pp. 1192-1201. Included in Part A. [44] "A Theorem on Coloring the Lines of a Network," Journal of Mathematics and Physics, Vol. 28 (1949), pp. 148-151. (Received Sept. 14, 1948.) Included in Part B. [45] "Significance and Application [of Communication Research]," Symposium on Communication Research, 11-13 October, 1948, Research and Development Board, Department of Defense, Washington, DC, pp. 14-23, 1948. Included in this volume. [46] "Note on Certain Transcendental Numbers," Typescript, October 27, 1948, Bell Laboratories, 1 p. Included in this volume. [47] "A Case of Efficient Coding for a Very Noisy Channel," Typescript, Nov. 18, 1948, Bell Laboratories, 2 pp. Included in this volume. [48] "Note on Reversing a Discrete Markhoff Process," Typescript, Dec. 6 1948, Bell Laboratories, 2 pp. + 2 Figs. Included in this volume. [49] "Information Theory," Typescript of abstract of talk for American Statistical Society, 1949, 5 pp. Included in this volume. [50] "The Synthesis of Two-Terminal Switching Circuits," Bell System Technical Journal, Vol. 28 (Jan., 1949), pp. 59-98. Included in Part B. [51] (With H. W. Bode) "A Simplified Derivation of Linear Least Squares Smoothing and Prediction Theory," Proceedings Institute of Radio Engineers, Vol. 38 (1950), pp. 417-425. (Received July 13, 1949.) Included in Part B. -5- [52] "Review of Transformations on Lattices and Structures of Logic by Stephen A. Kiss," Proceedings Institute of Radio Engineers, Vol. 37 (1949), p. 1 163. Included in Part B. [53] "Review of Cybernetics, or Control and Communication in the Animal and the Machine by Norbert Wiener," Proceedings Institute of Radio Engineers, Vol. 37 (1949), p. 1305. Included in Part B. [54] "Programming a Computer for Playing Chess," Philosophical Magazine, Series 7, Vol. 41 (No. 314, March 1950), pp. 256-275. (Received Nov. 8, 1949.) Reprinted in D. N. L. Levy, editor, Computer Chess Compendium, Springer- Verlag, NY, 1988. Included in Part B. [55] "A Chess-Playing Machine," Scientific American, Vol. 182 (No. 2, February 1950), pp. 48-51. Reprinted in The World of Mathematics, edited by James R. Newman, Simon and Schuster, NY, Vol. 4, 1956, pp. 2124-2133. Included in Part B. [56] "Memory Requirements in a Telephone Exchange," Bell System Technical Journal, Vol. 29 (1950), pp. 343-349. (Received Dec. 7, 1949. ) Included in Part B. [57] "A Symmetrical Notation for Numbers," American Mathematical Monthly, Vol. 57 (Feb., 1950), pp. 90-93. Included in Part B. [58] "Proof of an Integration Formula," Typescript, circa 1950, Bell Laboratories, 2 pp. Included in this volume. [59] "A Digital Method of Transmitting Information," Typescript, no date, circa 1950, Bell Laboratories, 3 pp. Included in this volume. [60] "Communication Theory — Exposition of Fundamentals," in "Report of Proceedings, Symposium on Information Theory, London, Sept., 1950, " Institute of Radio Engineers, Transactions on Information Theory, No. 1 (February, 1953), pp. 44-47. Included in Part A. [61] "General Treatment of the Problem of Coding," in "Report of Proceedings, Symposium on Information Theory, London, Sept., 1950," Institute of Radio Engineers, Transactions on Information Theory, No. 1 (February, 1953), pp. 102-104. Included in Part A. [62] "The Lattice Theory of Information," in "Report of Proceedings, Symposium on Information Theory, London, Sept., 1950," Institute of Radio Engineers, Transactions on Information Theory, No. 1 (February, 1953), pp. 105-107. Included in Part A. [63] (With E. C. Cherry, S. H. Moss, Dr. Uttley, I. J. Good, W. Lawrence and W. P. Anderson) "Discussion of Preceding Three Papers," in "Report of Proceedings, Symposium on Information Theory, London, Sept., 1950," Institute of Radio Engineers, Transactions on Information Theory, No. 1 (February, 1953), pp. 169-174. Included in Part A. [64] "Review of Description of a Relay Computer, by the Staff of the [Harvard] Computation Laboratory," Proceedings Institute of Radio Engineers, Vol. 38 (1950), p. 449. Included in Part B. [65] "Recent Developments in Communication Theory," Electronics, Vol. 23 (April, 1950), pp. 80-83. Included in Part A. [66] German translation of [65], in Tech. Mitt. P.T.T., Bern, Vol. 28 (1950), pp. 337-342. Not included. [67] "A Method of Power or Signal Transmission To a Moving Vehicle," Memorandum for Record, July 19, 1950, Bell Laboratories, 2 pp. + 4 figs. Included in Part B. [68] "Some Topics in Information Theory," in Proceedings International Congress of Mathematicians (Cambridge, Mass., Aug. 30 - Sept. 6, 1950) , American Mathematical Society, Vol. II (1952), pp. 262-263. Included in Part A. [69] "Prediction and Entropy of Printed English," Bell System Technical Journal, Vol. 30 (1951), pp. 50-64. (Received Sept. 15, 1950.) Reprinted in D. Slepian, editor, Key Papers in the Development of Information Theory, IEEE Press, NY, 1974. Included in Part A. [70] "Presentation of a Maze Solving Machine," in Cybernetics: Circular, Causal and Feedback Mechanisms in Biological and Social Systems, Transactions Eighth Conference, March 15-16, 1951, New York, N. K, edited by H. von Foerster, M. Mead and H. L. Teuber, Josiah Macy Jr. Foundation, New York, 1952, pp. 169-181. Included in Part B. [71] "Control Apparatus," Patent application Aug. 1951, dropped Jan. 21, 1954. Not included. pp. Included in this volume. [73] "A Mind-Reading (?) Machine," Typescript, March 18, 1953, Bell Laboratories, 4 pp. Included in Part B. [74] (With E. F. Moore) "The Relay Circuit Analyzer," Memorandum MM 53-1400-9, March 31, 1953, Bell Laboratories, 14 pp. + 4 figs. Included in this volume. [75] "The Potentialities of Computers," Typescript, April 3, 1953, Bell Laboratories. Included in Part B. [76] "Throbac I," Typescript, April 9, 1953, Bell Laboratories, 5 pp. Included in Part B. [72] "Creative Thinking," 20, 1952, Bell Laboratories, 10 [77] "Throbac - Circuit Operation," Typescript, April 9, 1953, Bell Laboratories, 7 pp. Included in this volume. -7- [78] "Tower of Hanoi," Typescript, April 20, 1953, Bell Laboratories, 4 pp. Included in this volume. [79] (With E. F. Moore) "Electrical Circuit Analyzer," Patent 2,776,405. Filed May 18, 1953, granted Jan. 1, 1957. Not included. [80] (With E. F. Moore) "Machine Aid for Switching Circuit Design," Proceedings Institute of Radio Engineers, Vol. 41 (1953), pp. 1348- 1351. (Received May 28, 1953.) Included in Part B. [81] "Mathmanship or How to Give an Explicit Solution Without Actually Solving the Problem," Typescript, June 3, 1953, Bell Laboratories, 2 pp. Included in this volume. [82] "Computers and Automata," Proceedings Institute of Radio Engineers, Vol.41 (1953), pp. 1234-1241. (Received July 17, 1953.) Reprinted in Methodos, Vol. 6 (1954), pp. 1 15-130. Included in Part B. [83] "Realization of All 16 Switching Functions of Two Variables Requires 18 Contacts," Memorandum MM 53-1400-40, November 17, 1953, Bell Laboratories, 4 pp. + 2 figs. Included in Part B. [84] (With E. F. Moore) "The Relay Circuit Synthesizer," Memorandum MM 53-140-52, November 30, 1953, Bell Laboratories, 26 pp. + 5 figs. Included in this volume. [85] (With D. W. Hagelbarger) "A Relay Laboratory Outfit for Colleges," Memorandum MM 54-114-17, January 10, 1954, Bell Laboratories. Included in Part B. [86] "Efficient Coding of a Binary Source With One Very Infrequent Symbol," Memorandum MM 54-114-7, January 29, 1954, Bell Laboratories. Included in Part A. [87] "Bounds on the Derivatives and Rise Time of a Band and Amplitude Limited Signal," Typescript, April 8, 1954, Bell Laboratories, 6 pp. + 1 Fig. Included in this volume. [88] (With Edward F. Moore) "Reliable Circuits Using Crummy Relays," Memorandum 54-114-42, Nov. 29, 1954, Bell Laboratories. Published as the following two items. [89] (With Edward F. Moore) "Reliable Circuits Using Less Reliable Relays I," Journal Franklin Institute, Vol. 262 (Sept., 1956), pp. 191-208. Included in Part B. [90] (With Edward F. Moore) "Reliable Circuits Using Less Reliable Relays n," Journal Franklin Institute, Vol. 262 (Oct., 1956), pp. 281-297. Included in Part B. [91] (Edited jointly with John McCarthy) Automata Studies, Annals of Mathematics Studies Number 34, Princeton University Press, Princeton, -8- NJ, 1956, ix + 285 pp. The Preface, Table of Contents, and the two papers by Shannon are included in Part B. [92] (With John McCarthy), Studien zur Theorie der Automaten, Munich, 1974. (German translation of the preceding work.) [93] ' 'A Universal Turing Machine With Two Internal States," Memorandum 54-114-38, May 15, 1954, Bell Laboratories. Published in Automata Studies, pp. 157-165. Included in Part B. [94] (With Karel de Leeuw, Edward F. Moore and N. Shapiro) "Computability by Probabilistic Machines," Memorandum 54-114-37, Oct. 21, 1954, Bell Laboratories. Published in [87], pp. 183-212. Included in Part B. [95] "Concavity of Transmission Rate as a Function of Input Probabilities," Memorandum MM 55-1 14-28, June 8, 1955, Bell Laboratories. Included in this volume. [96] "Some Results on Ideal Rectifier Circuits," Memorandum MM 55-1 14- 29, June 8, 1955, Bell Laboratories. Included in Part B. [97] "The Simultaneous Synthesis of s Switching Functions of n Variables," Memorandum MM 55-1 14-30, June 8, 1955, Bell Laboratories. Included in Part B. [98] (With D. W. Hagelbarger) "Concavity of Resistance Functions," Journal Applied Physics, Vol. 27 (1956), pp. 42-43. (Received August 1, 1955.) Included in Part B. [99] ' 'Game Playing Machines," Journal Franklin Institute, Vol. 260 ( 1 955), pp. 447-453. (Delivered Oct. 19, 1955.) Included in Part B. [100] "Information Theory," Encyclopedia Britannica, Chicago, IL, 14th Edition, 1968 printing, Vol. 12, pp. 246B-249. (Written circa 1955.) Included in Part A. [101] "Cybernetics," Encyclopedia Britannica, Chicago, IL, 14th Edition, 1968 printing, Vol. 12. (Written circa 1955.) Not included. [102] "The Rate of Approach to Ideal Coding (Abstract)," Proceedings Institute of Radio Engineers, Vol. 43 (1955), p. 356. Included in Part A. [103] "The Bandwagon (Editorial)," Institute of Radio Engineers, Transactions on Information Theory, Vol. IT-2 (March, 1956), p. 3. Included in Part A. [104] "Information Theory," Seminar Notes, Massachusetts Institute of Technology, 1956 and succeeding years. Included in this volume. Contains the following sections: "A skeleton key to the information theory notes," 3 pp. "Bounds on the -9- tails of martingales and related questions," 19 pp. "Some useful inequalities for distribution functions," 3 pp. "A lower bound on the tail of a distribution," 9 pp. "A combinatorial theorem," 1 p. "Some results on determinants," 3 pp. "Upper and lower bounds for powers of a matrix with non-negative elements," 3 pp. "The number of sequences of a given length," 3 pp. "Characteristic for a language with independent letters," 4 pp. "The probability of error in optimal codes," 5 pp. "Zero error codes and the zero error capacity C ," 10 pp. "Lower bound for P ef for a completely connected channel with feedback," 1 p. "A lower bound for P e when R > C," 2 pp. "A lower bound for P e ," 2 pp. "Lower bound with one type of input and many types of output," 3 pp. "Application of 'sphere-packing' bounds to feedback case," 8 pp. "A result for the memoryless feedback channel," 1 p. "Continuity of P e opt as a function of transition probabilities," 1 p. "Codes of a fixed composition," 1 p. "Relation of P e to p," 2 pp. "Bound on P e for random ode by simple threshold argument," 4 pp. "A bound on P e for a random code," 3 pp. "The Feinstein bound," 2 pp. "Relations between probability and minimum word separation," 4 pp. "Inequalities for decodable codes," 3 pp. "Convexity of channel capacity as a function of transition probabilities," 1 pp. "A geometric interpretation of channel capacity," 6 pp. "Log moment generating function for the square of a Gaussian variate," 2 pp. "Upper bound on P e for Gaussian channel by expurgated random code," 2 pp. "Lower bound on P e in Gaussian channel by minimum distance argument," 2 pp. "The sphere packing bound for the Gaussian power limited channel," 4 pp. "The ^-terminal channel," 7 pp. "Conditions for constant mutual information," 2 pp. "The central limit theorem with large deviations," 6 pp. "The Chernoff inequality," 2 pp. "Upper and lower bounds on the tails of distributions," 4 pp. "Asymptotic behavior of the distribution function," 5 pp. "Generalized Chebycheff and Chernoff inequalities," 1 p. "Channels with side information at the transmitter," 13 pp. "Some miscellaneous results in coding theory," 15 pp. "Error probability bounds for noisy channels," 20 pp. [105] "Reliable Machines from Unreliable Components," notes of five lectures, Massachusetts Institute of Technology, Spring 1956, 24 pp. Not included. [106] "The Portfolio Problem, and How to Pay the Forecaster," lecture notes taken by W. W. Peterson, Massachusetts Institute of Technology, Spring, 1956, 8 pp. Included in this volume. [107] "Notes on Relation of Error Probability to Delay in a Noisy Channel," notes of a lecture, Massachusetts Institute of Technology, Aug. 30, 1956, 3 pp. Included in this volume. "Notes on the Kelly Betting Theory of Noisy Information," notes of a lecture, Massachusetts Institute of Technology, Aug. 31, 1956, 2 pp. - 10- Included in this volume. [109] "The Zero Error Capacity of a Noisy Channel," Institute of Radio Engineers, Transactions on Information Theory, Vol. IT-2 (September, 1956), pp. S8-S19. Reprinted in D. Slepian, editor, Key Papers in the Development of Information Theory, IEEE Press, NY, 1974. Included in Part A. [110] (With Peter Elias and Amiel Feinstein) "A Note on the Maximum Flow Through a Network," Institute of Radio Engineers, Transactions on Information Theory, Vol. IT-2 (December, 1956), pp. 117-119. (Received July 11, 1956.) Included in Part B. [Ill] "Certain Results in Coding Theory for Noisy Channels," Information and Control, Vol. 1 (1957), pp. 6-25. (Received April 22, 1957.) Reprinted in D. Slepian, editor, Key Papers in the Development of Information Theory, IEEE Press, NY, 1974. Included in Part A. [112] "Geometrische Deutung einiger Ergebnisse bei die Berechnung der Kanal Capazitat" [Geometrical meaning of some results in the calculation of channel capacity], Nachrichtentechnische Zeit. (N.T.Z.), Vol. 10 (No. 1, January 1957), pp. 1-4. Not included, since the English version is included. [113] "Some Geometrical Results in Channel Capacity," Verband Deutsche Elektrotechniker Fachber., Vol. 19 (II) (1956), pp. 13-15 = Nachrichtentechnische Fachber. (N.T.F.), Vol. 6 (1957). English version of the preceding work. Included in Part A. [1 14] "Von Neumann's Contribution to Automata Theory," Bulletin American Mathematical Society, Vol. 64 (No. 3, Part 2, 1958), pp. 123-129. (Received Feb. 10, 1958.) Included in Part B. [115] "A Note on a Partial Ordering for Communication Channels," Information and Control, Vol. 1 (1958), pp. 390-397. (Received March 24, 1958.) Reprinted in D. Slepian, editor, Key Papers in the Development of Information Theory, IEEE Press, NY, 1974. Included in Part A. [116] "Channels With Side Information at the Transmitter," IBM Journal Research and Development, Vol. 2 (1958), pp. 289-293. (Received Sept. 15, 1958.) Reprinted in D. Slepian, editor, Key Papers in the Development of Information Theory, IEEE Press, NY, 1974. Included in Part A. [117] "Probability of Error for Optimal Codes in a Gaussian Channel," Bell System Technical Journal, Vol. 38 (1959), pp. 611-656. (Received Oct. 17, 1958.) Included in Part A. [118] "Coding Theorems for a Discrete Source With a Fidelity Criterion," Institute of Radio Engineers, International Convention Record, Vol. 7 -11 - (Part 4, 1959), pp. 142-163. Reprinted with changes in Information and Decision Processes, edited by R. E. Machol, McGraw-Hill, NY, 1960, pp. 93-126. Reprinted in D. Slepian, editor, Key Papers in the Development of Information Theory, IEEE Press, NY, 1974. Included in Part A. [119] "Two-Way Communication Channels," in Proceedings Fourth Berkeley Symposium Probability and Statistics, June 20 - July 30, 1960 , edited by J. Neyman, Univ. Calif. Press, Berkeley, CA, Vol. 1, 1961, pp. 611-644. Reprinted in D. Slepian, editor, Key Papers in the Development of Information Theory, IEEE Press, NY, 1974. Included in Part A. [120] "Computers and Automation — Progress and Promise in the Twentieth Century," Man, Science, Learning and Education. The Semicentennial Lectures at Rice University , edited by S. W. Higginbotham, Supplement 2 to Vol. XLIX, Rice University Studies, Rice Univ., 1963, pp. 201-211. Included in Part B. [121] Papers in Information Theory and Cybernetics (in Russian), Izd. Inostr. Lit., Moscow, 1963, 824 pp. Edited by R. L. Dobrushin and O. B. Lupanova, preface by A. N. Kolmogorov. Contains Russian translations of [1], [6], [14], [25], [37], [40], [43], [44], [50], [51], [54]-[56], [65], [68]-[70], [80], [82], [89], [90], [93], [94], [99], [103], [109]-[111], [113H119]. [122] (With R. G. Gallager and E. R. Berlekamp) "Lower Bounds to Error Probability for Coding on Discrete Memoryless Channels I," Information and Control, Vol. 10 (1967), pp. 65-103. (Received Jan. 18, 1966.) Reprinted in D. Slepian, editor, Key Papers in the Development of Information Theory, IEEE Press, NY, 1974. Included in Part A. [123] (With R. G. Gallager and E. R. Berlekamp) "Lower Bounds to Error Probability for Coding on Discrete Memoryless Channels U," Information and Control, Vol. 10 (1967), pp. 522-552. (Received Jan. 18, 1966.) Reprinted in D. Slepian, editor, Key Papers in the Development of Information Theory, IEEE Press, NY, 1974. Included in Part A. [124] "The Fourth-Dimensional Twist, or a Modest Proposal in Aid of the American Driver in England," typescript, All Souls College, Oxford, Trinity term, 1978, 7 pp. + 8 figs. Included in this volume. [125] "Claude Shannon's No-Drop Juggling Diorama," Juggler's World, Vol. 34 (March, 1982), pp. 20-22. Included in Part B. [126] "Scientific Aspects of Juggling," Typescript, circa 1980. Included in PartB. [127] "A Rubric on Rubik Cubics," Typescript, circa 1982, 6 pp. Included in this volume. K-t7«IA (-*»*) is J Cover Sheet for Technical Memoranda Research Department subject: The Use of the Lakato s-Hi okman Relay in a Subscriber Sender - Case 20878 ROUTING: i - Patent .Deit. (letter 9/27/40) / 1 — e— W.W.Ke^all, Case Pile 3 - T.C.Fry 4 - A* B. Clark s - B.D.Holbrook 6 - G.R.Stibitz 7 - G.V.King 8 -Miss Hanle mm- 40-130-179 date August 13, 1940 author c.E.Shannon INDEX NO. S4.2 ABSTRACT A study is made of the possibilities of using the Lakato s- Hickman type relay for the counting, regis- tering, steering, and pulse apportioning operations in a subscriber sender. Cirouits are shown for the more important parts of the circuit where it appears that the new type relay would effeot an eoonomy. a Tilt Use of the Lakatos-Hiokman Relay in a Sub bo r iter Sander • Cast E0878 / August 15, 1940 MEMORANDUM FOR ITU The Lakatos-Siokmen type relay 1 * using the relay springs as part of the magnetic eiroult can he used as a very eeonomioal type of pulse counter and registration device. In faot , one suoh relay with twenty moving springs can count and register up to ten pulses, while the same operation requires at least five ordinary relays, and some standard oirouits use as many as twenty to re- duce the spring loading on the relays and the contact loading in the pulsing circuit. It has been suggested that this new type of relay might he used for some or all of the many counting, steering, and registration oirouits in a subscriber type sender* The present memorandum gives some oirouits for accomplishing this* The chief problem in the design of these oirouits Is that of performing the various translating operations necessary in converting the incoming pulses into group and brush selections, or P.C.I, pulses as the oase may be, without using more oontaot elements than are available on the counting relay. Two different solutions are given here. The first was made as economical as possible but at the oost of one disadvantage. Under certain conditions of oontaet failure in the thousands or hundreds regis- ter the sender will oonneot the subscriber to an incorrect number rather than connect ing to a tell-tale and giving him a busy sig- nal. The seoond oiroult, which we will call the positive aotion oiroult^, is designed to overcome this difficulty but does so at the expense of more contaots and wiring. Some compromise between these circuits may be the most desirable. The oirouits by no means represent a complete sender. It appears that the problems connected with the offioe code (i.e. the first two or three digits) can be handled without muoh difficulty. At any rate these oirouits will depend on the type of decoder used, and would represent a second stage in the design* We have therefore designed what might be called a "four digit sender** considering only the problems arising in the thousands, hundreds, tens and units digits. We also have omitted consideration of the parts of the oiroult used for control and supervisory purposes, since these can be easily handled by existing oirouits, and do not directly involve the new type relay. Our chief purpose is to Isee "Oiroult Analysis for Laxatos-Eiokman Type Relay", 0. R. Stibits, MM40-150-1BO, Jan. 15, 1940, Oase £0878. ^This circuit was suggested by Hr. 0. T. King ■how that the new type counter oontalna sufficient contact element! for aost of the steering and counting circuit* of the subscriber sender. It is always possible to add more contacts at an/ stage in the new type counter by the arrangement of springs in Jig. 1, but this would be undesirable from the standpoint of standardization* At any rate it was found that even in the positive action circuit, only two stages in one register needed more contacts than are already available, and two additional ordinary relays were introduced here to carry the contact load* It should be pointed out that an extremely simple and economical sender (i.e., much simpler than those given here) could be designed using the new type counter were it not for the peculiar translation codes involved. Thus if we could start *Yrom scratch" and design translation codes particularly adapted to the characteristics of the new relay, the circuits could be made very simple indeed. Even using the existing oodes which were constructed to simplify the present type olrouits, the use of the new counter allows a remarkable simplicity and economy* The circuits were designed by a combination of common sense and Boolean algebra methods. We will omit the details involved in their design. Although it is possible that a few superfluous elements remain, it is doubtful if they can be simplified very much* Figure E is a block diagram of the proposed sender* In the present panel and crossbar senders, pulse counting is done in the same circuit for each digit and the numbers trans- ferred from this counting circuit to a set of registering cir- cuits, one for eaoh digit, through an incoming steering chain. The registering circuits in the panel type sender consist of a set of five ordinary relays per digit, while in the crossbar system the A digit is registered on one or two verticals of a crossbar switch* In Figure S, on the other hand, eaoh digit has one of the new type counter relays which acts both as a pulse counter and as a register. The incoming steering chain steers the incoming pulses to the correct counter-register rather than steering the number recorded by the input pulse counter to a digit register* The input steering chain may or may not be one of the new type counters* The steering opera- tion can be done with the new type counter, but it appears to require special devices, as for example polarised springs, in order to energize both windings of the register relays after receiving a digit* Even using the present type of steering chain a great simplification is possible, for only one wire, the pulsing lead, needs to be steered to the various digit registers, rather than the five leads of the present type sender* Another possibility is using a new type counter to count the groups of pulses and operate a set of relays 8^, Sj, Sq, Sthi Sst Sf » s U come 1a after the A, B, 0, IB, I, T, and U digits are received end energize both eoile of the corre- sponding registers* After the digits are registered on the new type counters, these numbers are translated bj means of the oontaet interconnections into the code corresponding to the incoming brush, incoming group, final brush, tens, and units selections, which are represented by a ground on one of the leads in the groups marked IB, 10, YB, T, and V, respectively. These groups of leads are connected in sequence to the revertive pulse counter by means of the revert ire group counter* The revertive pulse counter will be one of the new type relays and is connected in suoh a way as to open the fundamental circuit and thus stop the revertive pulsing when it reaches the first ground. The revertive group counter or revertive steering chain, of course, steps ahead after each group of revertive pulses through the action of a slow release relay. This last steering operation cannot be done solely with one of the new type relays for it is necessary to steer ten leads in the tens and units digits. It could be done, however, with a new type counter in conjunction with four ordinary relays. In the case of a call to a manual office the outputs of the digit registers are translated by a P.O.I, circuit into the correct P.O.I, codes. This circuit, too, can make use of the new type counter in the quadrant ing operation, i.e. in apportion- ing four quadrants to each of the four digits to be transmitted. This would be done with a sixteen stage counter (or if it is de- sirable to have all oounters with ten stages, two of these could be connected "in series") replacing the present sequence switch* Of course there must be an interlock between the incom- ing and revertive steering chains to prevent any selection being made before sufficient information has been received. This can be done by fairly standard methods* A rough comparison can be made between the relay re- quirements of the present panel type sender end the design pro* posed here. Omitting parts of the circuit which would be sub- stantially the same the requirements are listed below: Present Panel Sender Proposed Sender Ordinary Hew Type Ordinary Operation Relays Counters Belays Input Counting 1* - Input Steering It i • Registration »• f Revertive Counting . *Q t « Revertive Steering 10 L- JL Total U T In addition, a eequenoe ewitoh la replaoed by a new type counter. Tliasa figures are based on the positive action oirouit. Jhe other oirouit uses 6 ordinary relays. This eoaparison of the numbers of relays involved shows only a small part of the saving, however. The wiring and fundamental method of operation of the new oirouit is muoh simpler which tends both toward eoonomy and, providing the new relay ©an be made suffielently reliable, elim- ination of faults and errors* It is a little more difficult to give a quantitative comparison of tha proposed sender with the present crossbar type sender due to the differences in the types of oirouit elements In- volved, but it appears that the saving would be of the same order of magnitude* The new type counter with ten stages aota like a series of twenty relays which come in sequentially as the two coils of the relay are alternately energized. Thus after n pulses the first Sn relays are operated. If, after a series of pulses only one of the two coils on a counter remains energized we can only be sure of the oontacts on that side. It was found that under these conditions the number of eontaots available was far too small in all of the four registers for the various translating operations neoessary. We have therefore assumed the steering circuit should be designed in such a way as to energize both coils of a counter after it has received its series of pulses** This insures the oontacts on both sides and each stage then has the equivalent of two transfer eontaots and two additional eon- taots somewhat similar to a switohhook connection. Thus eaoh stage may be considered as a relay with the eontaots available indicated In figure 5. Our circuit diagrams are drawn from this point of view* Tor the convenience of the reader we will list the various translation oodes used in the sender* The incoming brush seleotlon depends only on the thousands digit and Is given by the following tablet Incoming Brush Selection 1 t 8 4 Thousands Digit 0, 1 *, * 4. 5 •See the memorandum "Oirouit Arrangement for Counting Relay with Mechanically Independent Contact Springs", by B* D. Bolbrook, HM-40-130-149, July 5, 1940, Oase ££108-1. The incoming group ssleotion depends on both the hundreds and thousands digits and is given bj tha following; Thousands Digit Hundred! Digit odd odd < 6 < 5 Inooeiing Group Salaotion 1 t 9 digit, Tha final brush salaotion dapands only on tha hundreds We hare tha following oodat Hundred! Digit 0, 6 1. • *, 1 3, 8 4, • Final Brush Salaotion s 3 4 P.O.I. Oode for Thousands Digit It should be remembered that an inooming brush, incom- ing group, or final brush saleotion of & corresponds to n ♦ 1 rerertire pulses. Tha same remark: applies to tha tans and hun- dreds selection. Digits are sent to a call indicator bjr series of posi- tive and negative pulses, four for aaoh digit* Two different codes are used for this, one for the thousands digit and tha other for thehuadreda, tans, and units. The thousands oode is an additive one baaed on the numbers 1, 2, 4, and 8 as follows: IT m m 1 Thousands Digit 1 8 5 4 5 * 8 9 Corresponding Additive Fumbers I II Quadrant III 8 - 6 • The sum of the numbers ocr responding to tht columns in whioh a digit has tha symbol - gives that digit, henot tha additive property of tha code. In this tabla I, II. IH, and IT refer to tha four pulses or quadrants. In the first and third quadrants represents a ground and a - represents a posit ire pulse. In the even quadrants means a light negative pulse and the -, a hear? negative pulse. We have chosen this representation of the oode for comparison with the P.O.I, circuit in which four leads are grounded or not in aooordanoe with the above table* Thus if the digit 8 is registered in the thousands place, lends II and HI in a group I, II, III, IT are grounded. The presence or absence of these grounds are translated into positive or negative pulses by two relays TS and RS. The hundreds, tens, and units P.O.I, code is also addi- tive based on the numbers 1, S, 4, 6. Using the same conventions it is represented by the following table: P.O.I. Oode for Hundreds, Tens, and Units Digits H, T, or Quadrant u Digit i n in it i .000 t o-oo 8 ..00 4 - 5 - 6 -00 T — — 8 - - 9 0- Corresponding Numbers (1) (8) (4) (5) The circuit for the tens or units register is shown In Figure 4. The operation is quite obvious. In the ease of a full mechanical call, if 6 for example were dialed in the tans plaee, the first six relays are looked in, which places a ground on the lead marked 6. These are connected through the revert ive steering chain to the revertive counter which reaches this ground after the seventh revert ive pulse. The presence of this ground operates a relay whioh opens the fundamental circuit and stops the pulsing. A ground is also put on leads II and HI for a P.O.I, call. The operation of the P.O.I, circuit will be described later. The thousands and hundreds register is shown in figure 5 for the positive action circuit and in Figure 6 for the more economical circuit. In Figure 8, many of the contaots do double duty, translating both for P.O.I, and full mechanical calls. This is done through a relay P which is operated for a manual call and not for amechanical call. In the hundreds register there were not enough contacts available in the fifth and tenth stages. The relays R and 8 ere used to •arrjr part of the eontaot load* This oireuit la designed ae that ohe and only one of the IB, 10, and TB laada la grounded for a given number. In ease of a oon- taot failure none would he grounded and the corresponding commu- tator would supposedly go to a telltale. In the oirouit of figure 6, on the. other hand, more than one of the IB, 10, or TB leads may he grounded at the same time. Thus if the thousands digit is 8, both 8 and 4 in the IB group are grounded. If the back eontaet on 8 failed the rerertive pulse counter would not stop the pulsing aotion at brush 8 as it should but would go on to the fourth brush. Howersr, this olreuit is considerably simpler than Figure 8, and does not appear worse from the standpoint of possible wrong num- bers than the present type of sender* The P.C.I, eirouit is shown in Figure 7. I is a relay whioh is operated in the odd quadrants and not in the even quad- rants. TS and RS are relays whose windings are oonneoted sequen- tially through the P.O.I, impulse ehain to first the thousands P.O.I, leads I, II, IH, and IT, then the hundreds, etc. aoeord- ing to the following tablet Th Digit H Digit T Digit Digit Pulsing TS RS Stage 1 Z Th I Th II 8 Th III Th II 8 z Th III Th IT 4 E I Th IT 8 z E I E II 8 a* E III e n ; i z E III E IT i 8 m T I E IT ; • z T I t n 10 m T in t n 11 z T HI T IT ;i» U I T IT [18 z V I u n u in u n 18 z v m U IT 18 U IT In the odd quadrants Z is operated, placing a ground on the fundamental ring (»)• The fundamental tip (FT) ia connected through Z to either ground or positive battery according as TS is operated or not. This depends of course on the condl- - 8 - t ion of the P.C.I, lead to whioh TS is connected at the time* Similarly in the eran quadrants light or beary roltage is applied to FR according to the eondition of RS while FT is grounded* Figure 8 shows the rerertire steering chain and re- rertire pulse counter. 0. S* SHANNON FIG. 3 — I — u V~ m > I 7 a L 9 J FIG. 4- TS/VS OR UMTS #£6/ST£K X u ■ Vj TITLE Vi Vi ■ SCALE Mil TtimM! UMIITMIIS. IK.. Ill f T1TLE 1 X u <\J <\J t ■i SCALE KU TELEPIHt UMMTMItt. IK.. » M ■ ■ J E E ES < PRINTED INU S • M M S H 0-C\J<T>«- rr~i 3=) n~i Ah*, ^^h. D rrn r~r~i 3 C" <Hi- *<Hl< k ^3 o <\j «i ■5 O - WO 1 I 1 ■ 6 CM 9 5^ SCALE IELI TEIEMW1E UMIUTOIIES, l*C. IE! ES PHI IN U.t.A. l ill-A l«-3») F/0. 7 P.C.I. C/RCU/T TITLE \* X u ■ Vj V) V) pi ►- SCALE lilt TELIPMIE liMIITMIH. IK.. lit J I E ES < IB < 16 0- I - 2- 3- 4- 'o- Z- 3- T I 5- 6- 7 • 8- < -o o- -o t>- 3 3 o o 9* -o o- o o- ■o o- c o o o o o- s's o o o o K3 o o O S5 -O O- -o o- 6 ■ 7 9 W x<~ I 2 j 4 5 6 7 ? 8 9 10 hT X u w ■ Vj TITLE vi Vj" 8* i SCALE Kit TEUF.ni UMUTHICI. IK., It* Tti f < ES A STUDY CF THE DEFLECTION MECHANISM AND SOME RESULTS ON RATE FINDERS by TKfS is a Final UNDER OmU .T Claude E. Shannon ^.w /L-lL - i f) 4 SUMMARY OF THE MOST IMPORTANT RESULTS 1. The deflection mechanism may be divided into three partB. The first is driven by two shafts and has one shaft as out- put, which feeds the second part. This unit has a single shaft output which serves as input to the third part, whose output is also a single shaft, used as the desired azimuth cor- rection. 2. The first unit is a simple integrator. It*, output rate is 3. The second part is the same circuit as previous rate finders. Its presence appears to be detrimental to the operation of the system from several standpoints. The output e of this part satisfies i • ■ x-f- y Ll 4. The third and most important part of the macnine satisfies q + R 4 + L q - • in whicht • ■ an input forcing function which except for transients in the seoond part and other small effeots ia the function whose rate is to bo found. q ■ the rate of e as found by the device. The output of the mechanism is sin"^" Q. R, L, S are. positive constants depending on the gear ratios, etc. in the machine. The mechanism therefore acts like an R, L, C circuit in which the differential inductance is a function of the current, v 1 - q 2 The system can be critically damped for differential displace- ments near at most two values of the current. Omitting the effect of backlash, the system is stable for any initial conditions whatever, with a linear forcing function, e s At + fl. It will approach asymptotically and possibly with osoillation a position where q is proportional to e. An error function can be found which decreases at a rate -R (q - q Q ) 2 4o being the asymptotic value of q. If the system is less than critically damped ordinary gear play type of backlash can and will cause oscillation. This includes play in gears, aaaers, lead screws, rack and pinions and looseness of balls in the integrator carriages. The oscilla- tion is not unstable in the sense of being erratic, or growing - 3 - without limit, but is of a perfectly definite frequency and amplitude. This type of backlash acts exactly like a peculiar shaped periodic forcing function. Approximate formulas for the frequenoy and amplitude of the oscillation are r 2 and /s 2 I UoLd -A) 2 <* c ^ and B 2 being the amounts of backlash in the two driven shafts as measured in a certain manner. 8. elastic deformations of shafts and plates can be divided into two parts. .One is exactly equivalent to the gear type of backlash and may be grouped with B]_ and B 2 above. The other has the effect of altering the parameters R, L, S of the cir- cuit and also adding higher order derivatives with small co- efficients. This will slightly alter the time constant and the natural frequency of the system. 9. The manner in which the arcsin function is obtained seems to me distinctly disadvantageous to the operation of the system for a nnmber of reasons, chiufly since to eliminate backlash oscillation it requires high overdamping near q ■ and this slows down the response for low target speeds. 10. The general problem of rate finding and snoo-hing is con- sidered briefly from two angles - as a problem in approxi- mating a certain given transfer admittance ana as a problem in finding the form of a differential equation. The first method based on a linear differential equation leads to ten- tative designs whicn I think would be an improvement over the present one. The second method indicates the -ossibility of still more improvement if non-linear equations can be satis- factorily analyzed. ANALYSIS OF THE DEFLECTION MECHANISM general Considerations . The deflection mechanism is a aevice de- signed to find 5i mechanically from the formula • in*! = S a ^ tp having cne shaft whose rate of turning is£ a and another whose angular position is Jj> t ?f giving c-t as the position of a shaft. The system is also supposed to smooth out small errors in^a* The mechanism, as actually constructed, is shown in Figure 1. By a rearrangement of adders, it may be drawn as shown in Figure 2. incidently, the device of rearranging and combining adder units is frequently useful in studying these systens. In this case it both clarifies the physical operation and simplifies the mathematical analysis. The box IV on the right of Fig. 1 represents two adders wigh, essentially, a common shaf t. The output is equal to the sum of the inputs with the indicated signs prefixed. A variable associated with a shaft represents the angu- lar position of that shaft unless specifically stated otherwise. Gears art omitted f rom t he diagram but included as coefficients in the equations. It may also be worthwhile to point out that the best method of setting down the equation of such a system is usually the following: 1. Considering oniy the integrators and function Lie-vices, label the various snafts UBing the minimum number of variaoles, Yiorkin^ backward from driver to driving snafts. Thus if the out- put of an integrator is labeled z, its displacement is i (assuming constant disk rate). If the output of an x to In x gear is sin u, its input is e sin u . Marking backwards rives the differential instead of the integral form of the equation. 2. Hew concentrate on the adders, grouping together cs many as possible, and write the equations of constrain*. These will be the equations of the system. I find the use of electrical analogues very useful in under standing tnese devices and have sed throughout a notation which emchasizes this idea. As the maohine is drawn in Fig. 2, it consists of threa independently operating units. The output of the first i3 a single shaft serving as input to the second, the output of the second a single shaft feeding the third, and the output of this being a shaft used as S 3, The operation is ruughly as follows: Integrator I multiplies its disk rate oy its displacement, so that the rate of turning of its output is y = ^0 t p £ a » The actual position of this y shaft can carry no significance. It is y ■ p. tp2 a dt +• y a variable which cepencs on the entire previous history of tne sighting telescopes to say nothing of possiole integrator slippage. At two different tisas, vrith a target at the same position and speed, this shaft would have entirely different angular nositions but the same rate of turning. The output of integrator I feeds into the middle uart cf the system which is exactly the rate finder, of saost older directors. This part of the divice seems to me net only super- fluous but actually detrimental to the operation. It is equiva- lent to an R, L, circuit (Fig. 3) with impressed voltage y and cutout x, che voltage across the inductance 3. A small response h(t) for the function g(t). High frequencies in g(t) appear practically un- diminished and in the same pnase in h(t) since the impedance is high compared to R. Thus - % t In ^ 1a t £e + h(t) In adder III, x is added to y in equal proportions to give e. e _ y + ±1 A +• K e Ll + h(t) Rl As vre pointed out above, y already contains an irrelevant additive constant, so the addition of another, gj" A which happens to be pro- portional to the target rate is of no possible significance. The term K e ' certainly is only detrimental being an unwanted transient. For a time I thought that the reason for the middle part of the machine was the final term h(t). For hi^h frequen- cies this is approximately g(t), and might be used to buck out these high frequency following errors, much as was done in some early radio circuits to recuce a-c hum. However, a study of the design diagrams shows that the two error functions are actually in phase as I have indicated in the equation, so that these high frequency errors are added , making the situation worse. £ven if the phase of x were reversed on entering adder III, I think it doubtful whether the presence of this part of the system -would be justifiable. It would be necessary to show that tne frequencies • were high eno.gh so that the two actually did cancel, and also that the disadvantages of the transient term did not overcome the advantages obtained. Note that the middle part can function in no way as a rate finder. The ri^ht hand part of the machine does its own rate finding as we will see, and the rate found by the middle part could not possibly be used because of the undetermined constant in y. •e prooeed now to the third part of the machine which is the major concern of the study. Concentrating on the adder IV, the equation of the system is obviously L -| sin" 1 q=e-3q-Rq or 5 qt iiL L q = e This is the equation of a series R, L, C, circuit with the in- ductance a function of the current passing through it. Induc- tance may be defined by the Lagrangian equations or by - 10 - and it is clear from the above equation that A i ■ l sin" 1 i -1 or A . L Bia 1 This function varies as shoim in figure 4. For our work, however a more useful parameter is what is sometimes called the differential inductanoe which nay be defined by so that in our case This inductance is useful when we have an equilibrium current qg and are considering the effect of small variations about this equi- librium. Omitting second order terms the system will be equivalent to one with constant R, L, G parameters, the inductance being taken as L^. The variation of L-q with current is snown in figure 5. The action is the opposite of that of a "swinging" choke where, be- cause of saturation, the differential inductance decreases with large currents. The mechanical idea behind the operation of this system is quite simple. Suppose shaft e to be turning at a constant rate. The system will be in equilibrium if the displacement of integrator V is such as to make its output feeding into the adder equal and op- posite to e, and the displacement of integrator VI at zero. Under these conditions, shaft q measures the rate of e and shaft V, the output of the device, the arcsin of this rate, if the rates are not correct, the adder changes the second derivative shaft in such a direction as to equalize the rates. The q shaft serves as a danper to prevent continual oscillation aoout the equilibrium position. - 12 - MATHEMATICAL THEORY (Backlash not Present) Differential Operation If e is turning at a constant rate and the system is at equilibrium, and then a small differential disturbance is applied to the system, it will clearly respond very nearly like an R, L, C, circuit with constant parameters, the inductance used being the differential inductance for the equilibrium current L y'i - 41 Such a system has a tine constant of 2 L eff 2L T x a tyl - q| It is critically damped if H 2 - 4 L eff S ■ 4L S which, of course, only occurs at 16 i/ For values of q greater in absolute value than this, the system is oscillatory, for values less, over damped. - 13 - Proof of General Stability -with Linear e In proving the stability of this system, I have used a method -which may be new in some respects. It was suggested by the fact that in a non-dissipative mecnanioal system, the potential energy U is a minimum at a point where the system is differentially stable, and the method is, in a sense, a generalization of that criterion. It is not, however, limited to differential stability, or to non-dissipacive systems. Since the method may be of use in other investigations of this type, I will first describe it in general terms. Suppose we have a differential equation system in which n variables and derivatives may be specified independently in the initial conditions. 7<e will say that the system is stable for all initial conditions and all driving functions if any two solutions of the system with the same driving funoiions approach each other in the sense that Lim 2 \x ± - y ± \ - o t ->co i - r where xj^t), x 2 ( t) . . .x^t) is one solution and y x (t) ...y n (t) the other. If this limit is zero for certain types of driving functions, we will say the system is stable for these functions. Thereomi If a continuous function Q(x 1 ...z n , y 1 ...y n ,t) can be found having the following properties ' X. Q>0 for all x ± , y t , t, the equality holding if and only if x ± a y ± . - 14 - 2, dQ a t all times, when the x^ and y^ are solutions of the system, with the same driving function. 3. It is impossible for Q to remain indef initelj>A ^ 0. Then the system is completely stable. For the function Q is non- increasing but always^ and must therefore approach a limit A>0 as t ~>oo , but by 5. A^O is impossible, hence A = 0, and each Ix^-y^/ — 5>0. Conversely, it oan be shown that if only a single forc- ing function is involved, and the system is stable for this funo- tion, a Q exists of the type described. Roughly, the method is to find a "distance" or "error" function Q between two solutions which is zero only when the so- lutions are identical and which always decreases. As an example of this method it is easy to prove the complete stability of the ordinary R, L, C, circuit with constant . parameters without solving the equation. The differential equation is " Sq + R$ + L q = e and we choose q and \ as coordinates. Let two solutions be q 1# q^and q 2 , q 2 «nd consider the funoticn Q = y ( qi -q 2 ) 2 + £ (qx-qg) . Condition 1 is obviously satisfied. How ||- SCqi-qgXqi-qg) + L(q^-q' 2 ) (aj-qg) - -r (ii-4 2 ) 2 £o - 16 - . S (n - At - 3 . EA)2 S obviously the minimum of Q with respect to q occurs at At B - SA q - s + s Also • a q - s ciQ = L y 1 - q which vanishes only for q'f It is readily verified that this is a minimum, and that (J is zero at this point for any t. Now dt oq » i - s 5S ( q -4-| + §)0..4)>L S S 3 - ~ 1-q and Vl-q 8 q s ^ - (At t- 3 - 3 q - R q) if q rjid q satisfy Sq f Bq + L > At +- B. V 1 - q 2 - 17 - Hence d| « (Sq - At - B f J£) (q - ±) ~ (4 " -f)Ut + 3 - Sq - Rq) ■ -E (q - |) 2 * Note that this rate is identical with that found in the linear case. Incidentally, it was by working baokward from this rate that a suitable function Q was first found. For Q to approaoh a limit K>0, it is necessary for q to approach zero, and q therefore, to approaoh a linear function of t differing by a constant from its equilibrium value. But from the original differential equation q must approach a oonstant different from zero, which contradicts 4^0. This does not however, quite com- plete the stability proof due to a certain meohanical peculiarity of the system. Let us plot the equilevel lines of Q against axes X * (q - At - | and Y « q. (Figure 6). The x io sin x gear in tne ac-cuai mecnanisn has a limited movement, and is prevented f rem going too far by e slip clutch and stop. If ' q Z 1, the stop prevents ;qj from increasing anymore. The original equation is replaced by • until the pressure on the stop reverses, oo far we have snowi that under the original equation Q always aecreases. In terms of our plot this means that if we start a solution inside the curve marked C, the solution will certainly converge to the equilibrium position, for the solution can never "escape" from C and hit one of the two lines 1 = r K, where the differential equation changes. ^7hen we are not on - 19 one of these lines a solution will, in fact, spiral inward in the clockwise sense, as maybe seen by writing the differential equation in the form ( n - i* B 3A, R As _ L a Consider the s igns of 5 and (q-A/s) in the four quadrants about the equilibrium position. In I for example (q-A/S) > and the X coordl- nate of a solution must increase with tj q < so q must decrease, giving a clockwise sense to the notion. Similarly the other quadrants may be verified. Some of the solutions starting out3ide of C will hit one of the lines, but the solution will still be stable. It is easy to show, by a study of the signs of the variables and their rates that a solu- tion can only hit the upper line to the left of the point with - coordinates I = 1 (| - £) and Y . K, and that if one does, it will nove along the lins to the right until it reaches P-^ and then return to the original equation. similar situation holds for the lower line. If we should start a solution on the upper line to the right of Pj it would leave the line immediately. The solution is always horizontal (i.e. q ■ <)) on tne line through P^, the equilibrium point and Pg. If R ■ the function Q is constant since £S ■ o &nd dt therefore the solutions of the equation Sq L q ■ At + B - 20 - are" the equilevel curves in Figure 6. I have attempted in several different -ways to generalize this proof for arbitrary input functions e(t), but so far have no completely rigorous proof, dowever, some of the arguments come so near as to m a k e me almost certain of oomplete stability. It can be shown, for example, that two different solutions with the same e(t> cannot definitely divergei i.e. |qj > -q 2 | f | |i-4g \ cannot become and remain greater than some positive constant (assuming e and e' bounded). Also if two solutions get close together (with respect to both q and q), they will certainly con- verge. The Effect of Backlash — — — — _____ In order to understand how backlash can cause oscillation, let us first consider a much simplified case. Suppose we have a second order linear system which is less than critically danmed with no backlash (Figure 7). Sq -f- R 4 + Lq-e If, at t " we suddenly impress e - E (constant) on the system (q - \ = 0), the response is a damped oscillation (Figure 8). - 21 - Now in the mechanical system there are only two rf i oniy two driven shales 811(1 B » and backlash only aff B( .+. C • or thes p dirCCtly) thS °P e ^ion of these. , robably tne gr ^ 18 W the adder av«+o„ driving shaft A. Let us assume for assume for a moment that this is the only backlash present and that its act. shaft. 18 " f ° ll0W8 < ™*» shaft a reverses airection ■ ( i. a whfln . n / U.e. when q - ) there i 8 a Bhor± — - * ^ s w h01d „ ~ ~" shaft ■ ^ &S MUUrfld from the , ^ Xt 18 that the response of the lash i. *h SyStem ^ bac ^- lash is the same as the response would be if the lash and at the ti - "° ^ ^ ^ ^ '™ <™sly Creasing - aoout to increase) we turn the e shaft B . w f 8haft " B l «ni in such a way 8 ^ * — ^ing this turning. snarly at the nest reversal we L±ve . . mcre,ent Bj ke epin g J constant through th- in n.v, 6 8 Peri ° d 0f °acklash. In other words, the res onse i 8 that ^ that 01 a V-tea, without back- lash on which we impress as f & uxi T;io n a wave wnich is aoout as shown in F igure 9. - 22 - If the periods of backlash are comparatively short, the small connecting portions (actually quadratic polynomials in time) will have little effect on the response. That is, we can assume a square topped wave with little error in $ or q especially, due to the smoothing operation of the integrators (or, said another way, cue to the high impedance of the circuit to ;a.gh frequencies). How suppose that there is a certain amount of backlash in shaft B. The action of this is to cause the carriage of the upper integrator to remain stationary for a small period when n q I 0. The same effect would be achieved if, at tnis time, we suddenly impressed on e a pulse wnich held the lower integrator at fero and kept changing e at sucn a rate as to keep the lower integrator there. lie keep the integrator at zero long enough so that its output \70uld have turned an amount equal to the backlash in B and then suddenly return it to its proper value, -his means that the area of the pulse must equal the backlash. The shape of this pulse would be a linear function of tine, but here again it is not highly significant. The entire system may thus be. replaced by one which is free of backlash and subject to a - driving function of the type shown in Figure 10, wnere B± is the backlash in A as measured 23 - from e and Bg is the amount in B as measured from e (in the sense that if e covers an area B 2 , shaft B moves an amount equal to itB backlash) . It is easy to see from our diagram that this forcing function is in the correct phase to sustain the oscillation of decay. Tne fundamental component of this forcing function is easily lound. .Ye have T Aj_ = y 6 s i n — t^. dt 1 o e may be split into a sum - one term for the square wave and oae for the pulse-like 3 2 part. The i^ 2 pulse is all concentrated near the center of the sine wave where it is nearly unity. Jfenoe approximately T A X - | 2 h. sin 2*t dt 4B2 2 X r|» ^ o = f-l 4 f o B 2 it The period T of this oscillation is the natural damped period of the system, to within a small error of size comparable to the length of tire during which backlash is effective. Hence itw - 24 frequency is approximately t - i fi T 2 and the magnitude of the fundamental component of the response q is 2£i 4 f B 2 I . i R 2 (coqLd- i \ Z "o c Providing the quantity f!l 4 f o B 2 is 8111611 » the d *' flection mechanism will behave linearly about its equilibrium position and the above formulae would approximately hold. If |qj / the equilibrium value of inductance L would /l^4q~ probably be as good as any to use since the differential inductance is greater on one side and less on the other. At 4 - the inductance is greater on each side and a somewhat higher value should be used, depending on 2B 1 4f B 2» If tne 8 y stem is more tnan critically if damped, q may or may not have an inflection point depending on the initial conditions. If they are such that the driven shafts do not reverse backlash cannot take effect and there should be no oscillation. However, if they do reverse once, the system may receive the equivalent of a "kick" in such a direction as to cause another reversal and so on, so that oscillation is set up. ihis problem has not been very well decided but if this happens, the amplitude formula above should still hold, while the frequency formula will not. - 25 - The question of "spring backlash" i.e. undesired effects due to elastic deformations of shafts and mounting plates has been raised. Acoording to Hooke's Law the angular strain in a shaft is proportional to the applied torque. This torque in a shaft the first term wnose si^n is that of -x 1 , being due to a coulomb friction load, the second to a viscous friction load and the third an accelerating torque. It is clear that the coulomo friction term I, can be combined with tie ordinary gear type backlasn treated above, and acts, therefor s, like a periodic forcing function. The effect of the other terms is ^uit.; different, their presence causes small changes in the parameters and 6 of the circuit and also adds higher derivatives to the equation. Let us consider only the spring in the shafts feeding L q (i.e. assume q driven whose position is x(t) can probably be very well approximated by an equation of the form I = ±\ +■ 2g ac« t K 3 x" (Sq - P 1 q - P z q) (R 4 - f x q - i g «') or - 26 - Sq + (R-Pi) q ' F 2 - *1. 1 - r 2 V = (e- « x i - a 2 e) - e X (t) Spring in the drive to q a similar effeot although complicated by the non-circular sine gears. If e is a linear function of t, so is e^ and the forcing function thus contains nothing to create a sustained oscillation. The left-hand side differs only by small quantities from the ideal equation Sq - Sq - _Ji__ q = e x , l-q> and will therefore surely approach the solution Thus we see that the "spring type" of backlash cannot cause sus- tained oscillation as the ;, gear" type of backlash can. However, if the gear type is present, the spring type can aid oscillation by reducing the damping, it may be necessary to overdamp in some cases in order to get an effective critical damping. It should be pointed out that the gear type of backlash may not be quite as simple as we have assumed, particularly in the L a shafts driving q 9 If the integrator carriage load is large aanpared to the friction loads in the adders and gears, then we are probably justified in assuming that gear pressures in the drive only reverse when the driven shaft reverses, however, if this is not the case, a backlash effect can easily take place at other times, for example -when one of the shafts feeding the adder reverses, without necessarily reversing the driven shaft \ The situation could become quite complicated, the equivalent input function containing several different sized steps occurring at different times, however, the fundamental frequency should Btill be approximately the natural damped frequency of the system, pro- viding the backlash effects are small and occur only during a small fraction of the time. The fact that backlash can cause a sustained oscillation leads to a cfitioism of the design of the mechanism, in particular to the metnod whereby the ercsin function is obtained. Note that reducing the amount of gear backlash 4f B2 will reduce the amplitude of oscillation proportionately, but apparently the only way to eliminate it completely is to at least critically damp the system for all equilibrium points, so that the shafts do not, in general, reverse direction. In the deflection mechanism as it stands, this would be distinctly disadvantageous, for if we critically damp at the maximum values of jijj, (the governing points) the system will be much over-damped near Q • 0, and in fact for most values of 4 due to tiie shape of the induct anoe curve. Another related argument against the manner of getting the arcsin is that the repponse to high frequency error functions depends on the value of q. It seems to me that the treatment of error functions should be independent of thet ); arget speed - - 28 - what is best for one will be best for another - since the predictlo: error we can tolerate is an absolute quantity, not dependent on the target speed. There may be some objection to this argument on the groundi that at higher target speeds the error funotion is apt to be larger, and hence the circuit should have a larger impedance, but even so it would only be accidental if the peculiar variation introduced by the sinegear was anything like an approximation to the desired variation. Finally, a minor argument against the position of the sine gear is that the equation becomes so difficult to handle mathematically. A design of this type must be largely intuitive or experimental - there is not much chance of ohoosing the con- stants for the best operation by a mathematical formulation, or of determining to speed of response etc analytically. These difficulties might be avoided in several ways. The arcsin might, for example, be introduced as in Figure 11. No doubt the reason this was not done was because -with [ \{ near 1, running the sin x gear backward is not mechanically practical, the gearing up ratio being too great. This objection could be - 29 - overcome in two ways - either a new gear K arcsin x to x (k large) could be used and the parameters R, L, 3 all decreased by a factor of k (or the integrator disks might be speeded up in suitable ratios), or, if this were not mechanically feasible, a rapid re- sponse servo mechanism could be introduced in the output, Figure 12. This system, can, by the way, be solved in closed analytic form when i is a constant, and reduced tc a quadrature in any case. The essential feature of this circuit is that the functions of rate finding and smoothing, and of taking the arcsin have oeen isolated. ,ach part can be designed to do its own job the best without comoromise. It may be noted that the arcsin circuit aoove also performs a smoothing operation which depends on target soeed. Sy suitable choice of the parameters we can make this larr;e or small fs T.-e desire. The ideal Hate Finder aaa Smoother Let us consider the problem of rate finding and smooth- ing from a general standoom^ and as* what mathematical opera- tion a macnine snould perform to act as zhe "best possible* rate finder. Cf course, rni s question has many answers, depending chiefly on what assumptions we make as to the input function, 3' - 30 - and what mathematical limitations we put on the machine. Tile shall assume throughout that the input function e(t) consists of a series of linear parts with cunrea connecting portions and with a small superimposed error function, and that we only desire the rate during (that is, some time after the start of; a linear part. In this section we assume there ar; no limitations whatever on the machine - that we can build a machine tc perform any operations we can ascribe, in particular those a mathematician might use tc solve the problem. How there is considerable experimental and theoretical justification to the t -eory that the best way to fit a curve of a b iven type tc a set of points subject to an observa- tional error is in the least square sense. If we assume this tc be true in our case, and attempt tc fit e straight line to the last a seconds before tj of the curve e(tj, we must minimize the integral *l I s e - (At-B) 2 dt with respect to A and B. The quantity a represents the length of the curve used in the fitting process, ne would like to use as much of the curve as actually represents a linear segment to get the best accuracy, but certainly no more. A person doing the curve fitting could look at e(t) and see fairly well where the curve showed a real tendency to depart from linearity, and select accor- dingly. Mathematically it could be done as follows. Suppose the 31 V -31- standard deviation of the error is 6 and that errors of more than say 4cr are almost certainly due to a significant departure from linearity in the curve. We oould choose a such that it is as large as possible without making the error I e-(At'B) | (A, B chosen to minimize I) tj-a £r t ^ greater than 4<f. In other words we use as muoh of the curve as we can assume linear within observational errors. As a final refinement of the solution it might be desirable to include a weighting function W(a.t) in the integral I, weighting the more recent values more heavily. The final evaluation of the rate is then the value of A given when we minimise the funotion ftl l(A,B.a) 8 re-(AttB) J 2 *(t,a) dt u t]_-a on A and B, a fixed, giving A and B as functions of a, and then cnoose a as large as possible with | e - (At+B)| ± K C t x - aftf This solution can be put into a more explicit form, but even wnen greatly simplified it appears that it would be quite difficult to carry out the calculations accurately by meohanioal means. The main difficulty is that apparently such a machine must be caoable of remembering exactly the past history of an arbitrary function, e or something derived from it. The only methods I know Of doing this are quite inaccurate, or else very complex, and it seems likely that ^he gain in mathematical precision of the above 3% - 32 - formulation -would be more than offset by a loss in mechanical pre- cision. Differential Analyzer Types of Machines Tc become a bit more practical, let us now confine our attention to machines of what, might be called the differential analyzer type. 3y this, vre mean machines constructed of a finite combination of adders, integrators, and function elements (e.g. non-circular gears). Two shafts e(t> and kt enter the machine - ana ore shaft u(t) leave b the macnine. It can be shown that any such system must satisfy a dif f erect ial equation of the type . • (n) *(q.q ... q ,t) = e(t) with u(t) a q U) . First, we ask what can bo said about the form of this equation to maJce the machine act as a satisfactory rate finder in our sense. 1. ..ith the same initial conditions and the same e(t) the macnine snoula certainly resDond the same independent of the Time of start, hence f does not depend on t. 2. .lien e = At B the equation must have an equilibrium solution q^ ^ ■ A q(* ^) = o (i-D q = At e • t i i i t - 33 - If i>l, the carriage of an integrator will be continuously moving in the equilibrium condition. This does not seem practical for the initial conditions may be anything depending on past history, and the integrator would surely go off scale in many cases. Obviously from the equilibrium solution, i is uot G, for this would icply a constant equal to a linear function of time. Hence i = 1 and q' = u(t). 3. Let f U.y) s f (x,y,0, ... 0) jue to the equilibrium solution f (At -i- C, A) = At - 3 for all k t J, t. it - jH*.y) A - A it j s. f (x,y) = X + h (y) " tit 4. Assuming f is fairly "well behaved", we have near q » q = ... ■ q(n) ■ p (i.e. near equilibrium) f ■ f (q, q, 0, C, ... , ) q *q ^ w ■ q h (q) * a 2 q^ ... % q 34 - and the differential operation depends on the coefficients &2 ••• a^and h (q). As this differential operation should not depend on t, the a^^ must be indepencent of q, for in equilibrium q cnanges with t. Ihey may aepend on \ however in which case the differential operation depends on the target speed, which may or may not be desirable. In the deflection mechanism this is the case, ag ■ 1 T-F" 5. iith q near a the above reduces to f • q f q — a 2 q— ... — a_ q( fl )-~ b where a^ ■ h» (a) and b - h(A}-Ah'(A). To eliminate backlash os- cillation the roots cf this equation should all be real and for stability all should be negative, for all desired A. 6. For complete stabil ty, there are no doubt further requirements on the. form cf f. This problem, however, is still unsolved. The above are only requirements on the form of f so that it actually does find a satisfactory rate. To find the best form of f would roquire u. very elaborate mathematical analysis if possible at all. ■ If we restrict our machine still further and assume a linear differential equation with cons-cant coefficients, it is possible to ^ive a fairly rational analysis leading to the best values of the coefficients. The question is this. Given the equation - 35 - » q *i q' ••• » n q (n) ■ e What values of the coefficients a ... a^ give the best rate- finding smoothing properties? From what we said above, it seems that the characteristic equation -> *n P should have only real negative roots and that the rate found will be q'. We may normalize the equation by assuming a ■ 1 so that q* is actually the rate and not merely proportional to it. In the Heaviside symbolio notation, we have q' = -V(V 1) writing the polynomial in the factored form. The b^ are positive real numbers and are the time constants in the transient part of the response. We assume the b, arranged in increasing magnitude. Let us frsae the problem as follows. Keeping the speed of response of the circuit the same, what values of the b give the best attenuation of the error function. Of course, the trouble appears in trying tc decide what we mean by keeping the speed of response the same, ^'ne answer is that we keep the maximum time constant, that is t_. the same. This may be partially justified on the following grc«ndsi 1. For "almost all" initial conditions, the term A e"-~ will eventually dominate the transient response, 24: - oo the other terms becoming arbitrarily small in comparison. The only time when this fails is when the coefficient happens to come out zero. 2. In the worst cases (other coefficients small in comparison) the b n term dominates for all t, and the machine should perhaps be designed with the worst conditions as governing. 3. If we use this criterion, it is easy to show that for best at- tenuation of error frequencies all the b^ should be equal. For the magnitude of the transfer admittance (e to q*) is = li 2 2, V (1- b k uj ) which is obviously smallest when each b k is made as large as possible, for all frequencies. That is, each b^ ■ b n the maximum. Another way the "same speed of response" might be in- terpreted is in terms of the expected area under the transient time curve. Keeping the standard deviation of this area con- stant seems to give the same evaluation of the b k as above but there are certain statistical assumptions in my proof that may render it invalid. If the characteristic equation has real roots, it may be set up nicely as in Figure 13. This circuit appears to have an advantage from the backlash point of view over the more owvious one shown in Figure 14. S 7 3s , ^ver that the use of nonlinear equation. It seems quite possible, however. +ot r« Consider the equation could offer a real advantage. S(q) q + Kfl> 4 S * • *. are functions of When the system where the three coefficxent. ere fu < + acts approximately likex i. at equilibrxum.it acts a. p 3(0) q 4- K0) q' - « " * be adlusted to give critical aamp- ^ these three constat, could beadj Man of the error function frequencies. On ing and a good attenuatxon of tw * at or near equilibrium, q. is the other hand, when we are not at or ki different from, tero. The values of the (usually) considerably dxfferen* (usually; w to g . ve a very three coefficients could be adjust , thuB .pproaoh the equilibrium posxtion faster, rapid response, and thus appro , v^ver that there is some fundamental error xn It is possible, however, tnax "w * .« attempt to do this would - *„* for example, that an attempt w this reasonxng, ror exwny necessarily cause oscillation. r irr J -» j^SSS: ^cuits. ^T^T- — ... — - — - - r D3 Si A HEIGHT DATA SMOOTHING iIECH/iHI3M Claude J2. Shannon 5/S6/41 A HEIGHT DATA SMOOTHING UECHANISa The so hematic diagram of a new type of height data smoothing me onanism Is shown In /igure 1. The discontinuous height data e(t) Is fed into the input shaft at intervals. This drives a differential, oonneoted also to the ball car- riage and roller of an Integrator whose disk is turned by a constant speed motor. A correcting hand wheel and the inte- grator roller feed another differential whose output is the output of the device. The output and input of the machine are compared through a differential feeding dial. The operator is supposed to turn the handwheel In suoh a way that the posi- tive and negative oscillations of the dial about zero are equal. The actual height of the target h(t) is a continuous function of time and we may assume that Just after each read- ing e(t) is an approximation to this* Thus h(t) and e(t) might be as shown in Figure 2. The shaft y(t) clearly satisfies the equation (1) 7 ♦ £ 7* • «(t) . The z shaft satisfies (2) x(tJ - yit) ♦ olt) and the dial roads (3) D(t) - e(t) - xUi . During the period between height readings the position of the alt) shaft is constant, aay sit^), the reading TiaJcen at t a , y *; y - 9 <V / * » -a( t - 1_ ) <. y - ett^ + ^ e * t n - t v t n + x Since y is obviously continuous, it will follow a curve con- sisting of a series of connected exponentials, each with the same tine constant, 1 • The continuity of the ourre implies - ^n 9 " * e < V • assuming the intervals between readings the same, aay a seconds, the response y for two different time constants m^a - In 2 and aua « In 10 are snovm in Jlgure 3. Hie larger the time constant, the acre the lag in response of y(t), but the smoother the curve, Jhis may be aeon another way: the o to y system is equivalent to an 3, L circuit with position of 3hafts analogous to voltage as shown In ifigure 4. with M small y follows e closely including the a irregularities, ./lth <g large y(t) is smooth compared to e but lags considerably. Movement of the hand wheel does not affeot y(t) but shifts zltj up or down with respect to y. If the operator turns the uheel to give equal positive and negative movements of the dial, it may be seen that in the "steady state" (say with f(t) - at) there is a constant lag even when the damping is low and the interpolation nearly linear. In this case the system bridges linearly between the raid-ordinates of the steps, while actually it should bridge between the points ( t n ♦ 0}. <ith higher damping the shape becomes worse but the interpolated exponentials are nearer to the true curve most of the time. *e 3hall find a formula for the best time constant of the system under the following assumptions 1. That the "best" time constant is the one making the actual error least in the mean square sense. 2. That we may take as the true curve, so far as our knowledge goes, the linear Interpolation between the points t Q + 0. This may be justified by the faot that the device cannot in any way perform higher order interpolation - the curve y(t) is con- vex upward whenever e(t) inoreased in its last step over the final value of y from the preceding step, and this is quite independent of the curvature of a(t). 3. That the system is In a "steady state", that is, that in the step under consideration y(t) ends at the aajaa distance below e(t) as it was Just before the step. 4. riiat the steps come at approximately equal inter- vals or a seconds. An interval under these conditions is shown in Figure 5. Here we assumed that the hand wheel was turned to give a ratio of -2_ as deflection of the dial just after to just before a step. .v'e have -mt y - A e with ylo) - b - y(a) A - b • a e" Hence 1 - e b a~ mt 7 " also l-e s - y - y(o) +c - 1 - <3" BA - o — s— + c -am l-e The Integral of the squared error per second is then -2 1 - b i -mt . 1 - e _aa a dt - 8 - k u 2 SJL- in * i e-^ ! 1 - e - 2 1 - e- D L 2 1 a k 2 ♦ 3 u^r s(1 - ,+ -t^j + k - 3 k L 1 - ^ 1 - e~ D ) D ) l- -D [2 (D d£) 1 a & ♦* ♦ 2 ♦ i (2 ♦ 4k) * D ♦ 3 + 5e ' D 13 } 2 ^ ll--D)2 20 (1 . e - D) It i3 evident from physical considerations that the minima of this expression ooours fop a fairly large D. In faot the error ourve was plotted for k - .5 (Figure 6) and the alnUBaa ia seen to be at about 7 or 8. ,<ith D this large the abOTe expres- sion ia very nearly equal to - 7 - sinoe e" D is very small. To locate the minimum we have 2* - jL - 2D (2 + 3k ) - 2 f ( 2 ♦ 4k ) 3 + 3] . Q D 2 D 3 4 D 2 16 - 8k) D - 16 8 whence 3 - 4k 7or k - •* 2 D - 8 Since the m**Hw«» is so flat (Figure 6) this formula is cer- tainly close enough. However a second approximation may he found as follows: for x small — - — - 1 + x. Using this in 1 - x the exaot expression to eliminate the denominators we get as a second approximation 2e' D ) - tl*k) U + e" D ) - J5 llW D ) - ± (l*e- 3 ) e" 3 J - a - £5 - « - 8 ♦ (3- 4k) D + [6D (D*l) * 2D 3 lk-1)] e~ D + 6D (D+l) Using the first approximation to obtain the value s involving exponentials, a better value may be obtained. Jor k - | the second approximation ia D - 8.03. The first and second approxi- mations are plotted in Figure 7. tfith k - -| the ourve x<t) is plotted for an interval with the "best" D, in Figure 8. It will be noted that the ourve is highly damped in comparison to the time between read- ings. The HIE error is then equal to It is interesting to oompare this with the HIE errors obtained under other conditions. If the devise is not used at all, but a direct coupling made between the input and output, the HIE error between the step function and the linear interpolation between points tjj + is (I) 2 . 1 CS) a t 2 [0 - (- ^) ] dt I m 1 m .577 b " y-sr " ' a so that the RLE error has been reduced to 40$ of this value. In Figure 9, the output of the smoothing mechanism, x(t), is plotted for a certain forcing function e(t), using the "best" value of m. It may appear that the output 1b still far from 3000th, and this is in a sense true, but it must be remembered that the variations in e(t) are here greatly ex- aggerated over what would be expected in practice. Finally it should be pointed out that a very mater- ial improvement in operation could be obtained if the opera- tor were trained to turn the handwneel to obtain a ratio 2 b nearer to zero than This, however, would probably be im- 2 practical. DIAL < f » C SM C08R iCTl^O- H AMX> WHEEL C[0 t. F.* t 2. H I nmOM DO ■ SOME EXPERIMENTAL RESULTS OH TEE DEFLECTION MECHANISM Claude E. Shannon June 26, 1941 Some Experimental Results on the Deflection Mechanism In a previous report, "A Study of the Deflection Mechanism and Some Results on Rate Finders," a mathematical study mis made of a new type of defleotion mechanism. The present paper is a further study of this de- rice and a report on same experimental results obtained on the M.I.T. differential analyser. For oonvenienoe in reference, the schematic diagram of the machine is repeated in Fig. 1. In the report mentioned, the utility of the middle part of the device -was questioned. This arose from a misunder- standing of the basic assumptions underlying the design and was oleared up in a conference with Dr. Tappert. The writer's analysis was under the assumption that the mechanism was designed to find rates for linear forcing functions only (i.e., that higher order terms were small by com- parison) , and the analysis is still valid if this is true. However, in practice, it appears necessary to assume higher order forcing functions and the deflection mechanism is designed to give the oorreot steady state rate (exoept for the non-linearity of the sine gear) for an arbitrary quadratio foroing function. Actually' the middle part (often referred to hereafter as the "x" part) of the devioe is certainly well worth while, as will be seen from some of our experimental curves. If a linear mechanism has a transfer admittance T(ja) from input e(t) to output 4(t) then J" Q(J«>) - T(»E(juj) where E and Q are the transforms of e and q. It is easily seen from transform theory that if e(t) » at ♦ b, a necessary and sufficient condi- tion that 4(t)->a a8 t-^>- is that «•>-»£ jo If this condition is satisfied the system may be called a first order rate finder — after the transient has died out, the output is the deriva- tive of the input whenever latter is linear. Similarly if 00 T(O) - Y'(O) - j T(0) - k - 2, 5, ... , n we have an nth order rata finder — in the steady state it finds the rate of an nth degree polynomial forcing function. In the deflection mechanism we have a second order rate finder sj- - + e^w 3 + CgW* ♦ ... if we assume / ■ nearly 1. A oircuit for solving A ♦ 4 2 i - sin" 1 4 under the same approximation, to the nth order is shown in Fig. 2. The admittance here is approximately 1 # a 1 (» ♦ a 2 (» 2 ♦ ... + Vl (j<u)n+1 ^ the values of the constants in the mechanism are 1 » 4.63 J"» y(» x S ** oa r * J" 1 ♦ 4.63 5.73 (j-r ♦ 1.094 (» S _ (1 ♦ 4.63 .1«Qj«rf In the previous report it was pointed out that due to a clutch and stop on the input to the sine gear values of q" -were limited to two hori- zontal lines (see Pig. 6 in that report). There is also a olutoh and stop on the displacement of the lower integrator. This effectively fur- ther limits solutions to a parallelogram ai shown in Pig. 3. Actually the limitation is fictitious — the q shaft oan turn an unlimited amount, but when this stop is in effect the stability point moves at such a speed as to be equivalent to q and \ moving along one side of the parallelogram. Thus if we keep the stable point stationary paths of representative solu- tions will be as indioated in Pig. 3. The trial solutions taken on the differential analyser may be classi- fied as follows « 2 I. Solutions taken -with the mechanism as designed. A. 8imple analytic forcing functions. 1. e(t) - a 2. e(t) ■ at t b 3. e(t) » at ♦ Vt ♦ o 4. e(t) - at 3 + fct 2 + ot ♦ d B. Response for 8 -typical target courses, the target vector Telocity constant. C. The response to some error functions superposed on typical courses. D. An attempt to get backlash oscillation. II. Approximately the come program although less extensively with the middle part eliminated* III. A few runs with typioal courses using three different third order rate finders. The constants of the target courses used nere as follows (see Fig. 4) i Course I S - 150 yds/seo » 507 mi/hr O 7 « 2,000 yds h^ - 1,000 yds $ m 0° Course II 8 • 150 yds/seo g 2,000 yd. h^ - 500 yds * "0 Course III 8 - 150 yds/seo 8 V - 4,000 yds h a • 1,000 yds • - 3 Course IT S - 150 V - 2,000 h - 2,000 in - Course Y Course VI S - 150 S V - 4,000 in h - 4,000 in 9 - - 14.96° V - 4,000 - 40 t S„ - 150 V - 2,000 m h - 1-000 M * - - 14.96° V - 2,000 - 40 t Course VII B - 96.6 e V - 3,000 h n - 1.000 6 - - 60° V - 3,000 - 115 t Course VIII 8-150 g V - 4,000 h m - 500 • • The distribution of these courses is indicated in Fig. 5, together with the approximate maximum range of the 3 B A. A, gun (21 sec. fuse setting). The actual input to the deflection meohanism is r* s h t a o p but since it was desired to compare the actual output with the true deflection sin" 1 i the quantity e was plotted against t and integrated to provide the input. To calculate I the following method was found to be the simplest. We have 8 h t ' --P **- o p A computation schedule was set up based on this formula, working baok- wards from the time of burst t + t to the present time P I II III (assumed) t ♦ t h V P P p " h/l*£8 g (t*t p )J 2 - yi- (ft p )S g tan *] IV T VI VII * p t / 78— IT from - I - TV ballistic curves The ballistic data used in getting t (IV) was read from the chart Fig, 24 Opposite p. 59), Coast Artillery Field Manual, FM 4-110. The value of t p was merely read off corresponding to the computed values of r and h . P P If we assume as an approximation that the shell velocity is oonstant, k yds/seo (i.e., that the equi-time of flight curves in the ohart are circles) so that with V constant , 2.2 .2 „2 k t « h + V P P h - h + S (t+t ) ' p m g v p' p m h/h" ♦ S t 2 we oan eliminate t p and h p from the system to obtain the following equation between e and tt o e 2 [k 2 (h m *S g t) 2 (h^ 2 )- (h 2 *S^)V 2 S 2 ] + *[2 vsWhfVTt 2 ] - C^5T 2 *TT 2 (h *ts ) 2 ] - o g m n g ' 1 g m g m* m g' J Evidently the same curve a (t) is obtained if h and S are both multi- o m g plied by the same constant. The differential analyeer set-up used is shown in Pig. 6. An attempt was made to generate the sine function with two integrators solving but this was found impractical because of the large integrator loading necessary, and an input table was used instead. Even in this case it was necessary to use a very large scale factor on the independent variable shaft due to the small integrating factors (l/S2) of the differential analyzer as nompared to the ball type (about 1 under comparable condi- tions). ,This resulted in solutions which represented, actually, 30 sec- onds requiring 30 minutes of maohine time. The equations of the deflection mechanism are 9 i * .54 x - .54 | ♦ 4.700 q ♦ 1.692 q - 1.692 e ♦ 4.700 x 1 1-4 It was neoessary to approximate the ooeffioients with available gear ratios on the differential analyrer. Fortunately some very close approxi- mations were found. The equations actually set on the machine were 6 7t? * ♦ .54 :X - .54 i ♦ 4.706 $ ♦ 1.694 q - 1.694 e + 4.706 x The error is of the sane order as the expected machine error. Except for runs In group ID the. machine was made as "tight" as pos- sible, the backlash being corrected by frontlash units. Due to the large scale factors used and the high inherent precision of the integrators used in the differential analyeer, the rune ray be expected to be more accurate than the actual deflection mechanism. Solutions were taken in the form of both curves and counter readings. The ourves given here -were reproduced by pantograph to ordinary graph paper size. Curves not directly drawn by the machine and numerioal values quoted are taken from the counter printings, which give an additional decimal plaoe not readable from the ourves. Discussion of Runs Host of the curves are given with 4 as dependent variable. To esti- mate the error in yards for a given error in q from e, the ohart of Fig, 6A may be used. This is computed from the approximate formula r cos t IS . r ££L* Aq - r A(e,q) Aq /l-F For rough comparisons the coefficient A may be taken as 1, the error then being the 4 error multiplied by the predicted range. The first set of runs taken were with a sudden impulse e - kl with the system at rest, both with and without the middle part of the meohanism. Runs were taken with k - 0.1, 0.2, 0.4, 1.0, 2.0 Typloal curves are shown in Figs. 7 and 8. The results are very close to computed ourves on the assumption that l/f/l*^ ■ 1 when k < .4, but above this the non-linearity becomes appreciable. In the worst cases the sient disappeared to within machine errors in 25 seconds, and for most oases within 8 to 12 seconds. The action with the middle part out was 7 considerably more rapid than -with it in, the transient being 6 tines as great, as had been predicted, this being a special case of a linear forcing function. Pig. 9 is a -lot of the time required for the transient in 4 to reduce to 2/10 of its maximum value. For values of k greater than about .35 the curves cross the axis once with the middle part in. The curves with it out are all" identical with k > 2, due to the action of the slip clutch on one integrator. - Next a series of runs were taken e - ktl(t) starting from rest, with sin""T: - steady state S - 15°, 30°, 45°, 60°, 75°, 60. G° the last being the limit of the sine gear, the maximum possible deflection. These runs are shown in Figs. 10 and 11. The transient died out in all cases within 20 seconds except with x in for S > 75° in which oases 30 seoonds or more was required, due to the action of the slip clutch. These long transients, however, would probably not be troublesome since such large deflections would only ocour in practice with the plane almost di- rectly overhead. For the smaller values the response is about equally rapid with x in or out. Quadratl o Forolng Functions — — — — 1 The runs with a quadratic forcing function e - at 2 were the first to show the superiority of the mechanism with x in. Runs were taken with a - .01, .02, .03, .04, .10 With a quadratic rate finder the solution q" should approach 2 at, and with x in this was very nearly true, the discrepancy being due to the sine gear. 8ome solutions are shown in Figs. 12, 13, and 14. The errors increase with a and with \. The maximum slope found in air/ of the I courses plotted is about equivalent to an a of .05 so that the large errors due to the sine- gear with a - .10 need not cause great concern. 8 Cubio Forcing Functl ong For oubic forcing functions the following were used • ± - -.04 t 3 ♦ .1 t 2 e 2 - -.001 t 3 ♦ ,05 t 2 e 3 - -.0002 t 3 ♦ .02 t 2 .These -were chosen as having second order tangenoy at t - so that the transient is small. The results are shown in Figs. 15 and 16. The re- sponse with e 2 and especially e 3 are very olose to the calculated values on assuming the equation linear. The error in e^ is somewhat greater as in the quadratic case with higher acceleration. Effect of Backlash — — — — ' A number of runs were made to determine the effect of backlash using several different foroing functions. In order to inorease the amount of backlash, frontlash units were inserted at several oritioal points in the baokwards direction. The results of these runs were, however, oompletely negative, for no oscillation of any sort was discovered. The system was given "shocks" by sudden turning of the e shaft and other methods, but the solutions were oompletely stable The only results were small consistent errors, of the order of magnitude of the backlash. It is possible that due to the large soale factors used in the set up, even the artifiofelly introduced baoklash was not sufficient to oause the oseillatlon effect. Response for Typical Courses The response for the 8 oourses described above are shown in Figs. 17 to 24. It may be noted that even on the flat oourses (e.g., IV) the opera- tion is poor without x. On the flat oourses the response is satisfactory with x, the error being less than 20 yards except sometimes at the hump in e. However for the steeper courses errors of 60 or more yards are common after the start of the peak which do not disappear until nearly the end of the oourse. The action is particularly bad coming down the hump. Fig. 25 is a plot of the error in yards with oourse VIII, x in. 9 Response to Error Functions In Pigs. 26 - 28 are shown the responses to some random error func- tions of various kinds superimposed on courses I and II. The operation in damping out the error is considerably better with x out. However it seems from a consideration of the size of the errors introduced and the responses found that the system, even with x in, damps the errors more than necessary. That is, it might be preferable to increase the speed of response so as to reduce the transient errors in the solutions. Pigs. 29 and 30 show the responses when we suddenly start tracking a target in courses I or II with the machine previously at rest, with the target at several points along the course. Tests with Different Equations Three runs were made on course VIII, the most difficult one of the : group, using three different cubic rate finding equations. The equations used were (assuming linearity) critically damped, with the transfer admittance st [i ♦ 2(>)r 2 (2) 4 . 1 * 4(j«fr ♦ 6(J.) [i ♦ (J-)] 4 The results of these runs are shown in Pigs. 31, 32, and 33 and should be compared with Pig. 24. Of oourse, this gain is accompanied with . a loss in error function damping. With the^roots equal to 2 the system had a slight tendency to be unstable on the flat part of the oourse. This however appeared to be due to the "human backlash" in the operator on the sine table and would probably not be present with a sine gear. It is easily seen that an increase in the values of the characteristic roots of the equation demands a proportional increase in the power require- ments of the integrators. It may be that this will be a design limit in the case of meohanioal systems. Ho difficulty would be experienced here however with electrical integrators. 10 The main conclusions of this work are as follows: 1. The middle part of the machine is definitely worth while. Although it increases response for accidental following errors, the gain in behavior for actual courses more than offsets this disadvantage. 2. The system behaves nearly enough like the linear system 1.094 "q ♦ 5.73 q ♦ 4.63 q ♦ q - 4.63 I * 4.63 e to within a few per cent, ction of 37°, the approxi- that this may be used to calculate its providing q < .6. As this corresponds to a mation is sufficient for most eases. 3. For targets whose elevation at their nearest point is greater than about 50° fairly large errors occur due to substantial cubic and higher degree terms in e. This indioates that it might be worth while to use a higher order rate finder. Tests made with a oubio rate finder showed greatly improved results. 4. If the additional cost of another integrator and adder required for cubic rate finding i B too great to be Justified it appears that the system oould be improved by reduoing the time constants, for if sufficient power is available from the integrators, the only disadvantage would be increased response to random error functions and our results indioate that they are now damped out more than neoessary. 5. There is some indioation that better results would be obtained by making the three time constants equal, or more nearly equal than they are now, although this is not certain. 11 mr— < m u m, t mmm l-.-jgni — inS^^B^^ESS — — %5S55 immmm tw ■■■■■■■■■■■■■I S3 IMBttS HIMlUHmMUMilMN wmmmwmmmmmwmmmmmmmm^wmmmmmmmm mmmmmmmwmmmmmmmmmmsr □ ^H^^^ igOiffililllfin imlUlIl iOtliiinflmiiiii iioio|i| Illy gnl gm^ ■■■■■HHi •IZI !!*••&•»■« ■IM ««••■■■••■ ••««■••••• •■•■■•••«* ■apt •«»••■■•■• aMsavaaas mmmt Imu Man Miii mMini iaaf »fj»8 ■ ■ IIIUilMUMIt* — -■■■■»« !!■■■* ■■■■■i iftai iMNMIitMin ilOasHS: aaaaaaaaiiai aa uuiiiiii ^^JiiiliiliillliiiHli BBSS »SUua IIIIIMM itS" SSSli iig^iiiiiffliiliElili^IBt^lili piiipipillliiiiiPiill! ••■•it jyyjlgHOjnllL MSMMMMmiMNffMNNIflMI MiZa 55555 iitH am M"j ■ESS ScSS Bwn mvm nvuvv toHBS Sasui :::::::=: 2K:r p^^g|gliPpillipigii sasBS., laMiyllillRSiyiio ■■■HMIBH|iliHi;s: HHiHiniiiHHH liiiHtan.!*' ■ tmmmf »«««»»««»»»«««« »«»» » lllli HIS ■«»» ■»■»« *** Sii f? = T = -— i--^— :rt~;:: ••■"■■•••■■•B.BBII.IIIIBB.II. ■■■■■■■■■■•I* Jl I • • ■ . ::::: ■■■ !!!! ai1111 Iaaai 1 Hiaa>l »■•!•■■■* ■■■■■ " hi "!! ! ■■•■■■■■•■•■■■a .IB. ..III! ai'BIBII BBBaiBIBBIII ■■■■** ■■■■■■■■■■■■■■■■■a it ■■■■<- - ■*■■■■■■ wmw* ••••• urn • •••2222222 21222 222*. 2222! 22".. ■ ■»" bbim Miiaiiaisami. ■■■■■■■■■■ ■■■■■■■«■■ «•«■•••«• riiniiiifMiiniMiii *iimiim« (I imimimmSm!!!m ::::::::::::::::::::::::: : ::::::::::::::::::::::::i: ■■■■■ ■■■■■■■■■■ ama ■ ■•■« ■ • •■•-■■■> awi aauiiMMiMaa ibm. .£2 ZZ 22222 bbiibbbbm ibbbb um imi mn ■■■■■■■■■■ •■■«..■■■..«■.. .... ■ •••.■•••••••■•.•■.••.•••■.•••.a.... ■•••■•■••« ■....«■!■• !•»•••■!•* iiiiiiiniii um miiiiiiiimi iMtiiiiiiifniiiiifMiiiiii lUMluniMttilu ••■iiiiiMtiiinnimni ... ■■■■■■■■■■iMMiiiuiimiHiiiuii .......... ■»■■■ iiniitMiiniiiwiiM mMH immmiimimiiHiMmnnmiiin ...ii ■.. ■■■»..•■•• mi inn in Mill ■■■■■ M1M ■.M.^.-.W W _ ^lOTHMIUaa. •••■■■»■• BUM ■■•■■■■•■■..«.. ■•••■•••■1 III '■■■■■■■■■■■■■•.I ... ■*;■?«■■•■■■•» --«■■■ ■ .in...... bk mmm ■iiiaa •imi • ■■■■t ■■■■*■■ BBBasiiiL-t :::::::::•:::::: '\:::::::::::::u:::::::::::i::::^:::::::::::::::::::::::::::::: :::::: ..i**;;" -•»••»•■■•••»■•■•••• •••••• ■■ miiin 222222222! !22*i~ r *** "t- ..»•».■•.•■••..■•.•• II2I! II22! ?*;■! f^£i. ■■•■■■■■•■•■•■■••■■■ *«b •* . • ■■• ■.*•..■•«■ ■■■■■■MB. PBBM ■•■»* «..■■ BIBMBBBBI IBBBB IB ■ .BBSS ■■■*■■ ■■■■■ •■■■■■■■■a SSSSSSi 22222:22222222 s ■••■•■■•"•^^•'^■« •■-■«■••■■■■■••••••■ 2222! 22222 2222! 12222 252" Ik^-lkMIIUIIIIIIIIIIIIIiiiuUIIIIIIIIHIIIill 2222! 222222222! 22222 !22±f i-iisis^*"* -«m..i»..... ....... !222!!!22!!222!2222!2222! £2222 222^* --•*»*■ ■•■«■••■■. ■ ■•«■•.. ...a... -is. ii 2222 2222222222 2222222222 !222!2M2!*^* : ■■■■■u inrni ■■■■ bbm bbbbb ... bbbbb .ihii ami .. ... .... . .u llll»IHIIIIIIIIIIII|l|llllllllHUIIIIUIIflllllllllll HiiiiHiiiiiimii iitttiiiiu iiiiimitiia ii iiiiiii iMiiiiii •■■■•aaaaa aaiiaii ■■■ ■ BIIBB milMHI 1.M1 ■■■ ■ ■ IIIIB milHII|«M|||| ■yi ■■■■■ itwi mw ■■■■■ ■■■■■■■■■■ ■■iiiiwi ■ win ■■■imii aaaaa.aaa. iiihiiiiii»iiiiiii ■■■■■•■•■i ainaai- ■Mf IHII ■■■ ■■■■■■■ Hniiiin ■■■■■ mil ■■■■■■■■■■ immiii ■■■•■••••■■•■••■■••■•■■•■■MtlmiNMI ■«•■..«.■. IZ1I22 222222222! ••■■»•■•■■•■■■•■• 222? 222SS2222! 22*2* ^■■■■^■■■■■■•■■■■■■■•■■■■■■■■■••.>aB»aaaaiiiaaaa.a... 222S 22222 22222 22222 ^2 * .■»■. n. umii— ■ BUM I ass: • ••Si aaaiaasaa. miiitiii ■■■■■iiniiuMMUiMm 2222 2222222222 22222 25222 22222 SlSnSSSSS IZZZZ ZVZ* ZZZmZZmZZZ lliilzlilllll^ lljMIIIIII MUlllllllttMMI ■■■■■■■■■a ■■■■■ aa 8 B858 g— ■ ■■■■■mwWMHiii— ■■■■■■ ■■■iiiiaia N flllMIIIIIIIMIMMIIIU Mlllllllil m miiim 2*J22 22 212 .2222 22222" 2 ! 222222 2222222222 22222222 i !**"*" w * 2222 22222 222222222222222 222222222! • ; a ; i !»;*;!! a *!;? !M'g! f!jiM*»»i*MiiiittitiiitM»*iiiiitiiiMM«iiMMii» 222! ? ' " *" aa i "V a a *" a *** l * MW, m ' l> ' > l iii MH in— ni ia ii ■ BvaaaaBiaa aaaieaaaa ■ aaama mim t . mmu ■ I ••tlMIIIIIIIIIII | .III..: mm ■■■■■■■■■■ tun ■ WlMBil its: Hiiiuimi iiNiiiaiiiiu ■■««■■■■»■ ESaaaBaaaiiiaiiii iiiuh mn inii Hiniiui inmiHi ■■■■■■■■■■■■■■■t ■ HIIIIIMiH '•»« IIIIIIIIHIIIHI ■■in inn aaaai mhjiiiiim .......... .. • aaa miiiiiHiiini. ■ ■■■•■■■■■■■■I IIIHU ■■■■■■■■aiiiaaiiii ■■■a ■■■■■■■■■a Bin ■■■a ihii bbim iibj ■ ■a. iiiiinmiimiaMB mmm\ miiHin ■■■ ■•■Niiinmu mniiMi •«•■■ aia.iaaava iiiibmim a .bib *■■*■•«■■■ ■■■■■•■»l ■•■■IIMHiaHIIIIIII Hilllllll ■ibbb uaifl »■»•■ ■ a. umimiiiiimni imiiiiiii ■mm aaan mm mmm ■■■■■ bb aaa ■ aaa ■ ai IM ima iMiiniiiiiHimiinniimmiiiiiiiiimmiiuiu ■■■■iimhhimihii 2222 222222222! :22K22*"*2"""**"*""*"""*"**"^ 222! 12222222m 22222 22212 221222222222222 2222221222 2222222S22 **;'****;;; hi m I Embbmm (SiiiHiiiMHiiinj 1! iiU llllsiiltalll br: iigii§^iiiii!yoriHniiiiiLiiyyiii§iii| sh s=s nca sr ■• rr: xsa ssisn rrsrrs rr: b? r brsrrs •?•■? ■•:■! am k:k tnsRSSiaRSBif IIrrrr: rrs nss rt^ r r: xsa rrs » rr rrs rrj rr: Br: £r::::rr:::r:r:r::r::: rr: ass b^j r= rr: rr: rrrrr r:r :rr R«. • • . « .«•• •••** ■•«• krcrrj lliisfeltil rrjbrsrrirr: RRIRRtB^ BTWRR3 BR3RR3<RRB:a BR;nm ^ ■••••if HRSIBR3 IE jmw^mmmmwwmmwmmm mmmmmmmmmmmmmmmwAM :rrr ISS! 'rr: S!BB BR iitSS: rrb: srUIIIH hrJI m it*** iJliib HHHHH krr: :brk ■fiuilliiiiiiuiflH«iiia»ift*fliMM*ai*w« I ■ ■iHiiiiiiiifimMiffumiiiuuMtMiM I B"~" !HB M, H M '!l MW * >M *" l "t >w "Mwi : "!! l !''*!"* ,,> "!! mui * > * , '!*MP ia > I ■uiatMBMUiflisiMUiiMMiiBMaMiaai I — MM**! mill lilt. ■«■■««» _ IUIIUI.. . SKKSS ::::: :k:sj: HSU ksbkj ■ »*■■«*■■ mm a asm ■ ■ Himwii >Maa» tSBaei: ~ MM WWW assi alaaat^jaaaaa [■■■■■■■■■■■■■^■■mw |i«ntifiiiii> r .■•■■■■•■■■■■■■■■■••t ■»•■•**»•• I »y MM iiiiniini»iiu I ■miuiuiiir|»VHU mUHmuiai aaaaM ::::::::::::t»::w.:: :::::::E:::::u:Rn:::::nn:K:::us:i I r ::::::::::::::;::::!::::::;:::::::; :uk: ■ MIS UU ■ M»*« ••■>•■ aasasi =7 — -^aVBltflt'M §? Aai — ;;:;t;::;;ntaiie;h;t;«:»«aw!i • b ha ■ * » ■•■■■•■•■■■■■■■•■■■■•■a liiiiiiiiiuiiitiiiiiii "IIIHI»lltlllU«IIHIIinMllinHlllMaHHMIIffllllll*»l>ill KllKll ■IIIIIMIIMIIMItllHIM* lllllllll"lllllllMIIIIIIIIIIIIIIIIIRIIII(lU»llt*llllt.««M<«lll*f»<lllMI«lllll* ■ •«>(■■-«•••. ■■■■«■•■> «>■■■■•■■«•■••■>■«■■■■>«■ ■■■■ ......... .*•■..•«.»•.». ...... lHi>li*i«iiUMiiniiiH«iiiiiiiiitiiiiiniiHii«iiiiiiii« >it(Muii«i«» l . a ii(i«if»iiiMiimiiii>' !■■■ ££!5!5ffff> SfSSSflSSfHSS!! ■•■55 ■•••■••••••••ft *■*•!*»»•••••« »••>»*•* ■ •■■« « » « • « ■* • ■ ■*»» » ">*•••■■*«>■■■• *ar *■ ■>•..••■■ * * t«. *»»•••,■* ..«* -t L. * IlltllXI MKftiiiitiiiiiriiiitii . »« *;--•>•«•««. ■ »•* ■■■■■■»*•#««» IIIIIIIi^MI: •l] l ..|t|l«' c ||i| tt ||M| t , ■*»*■••■•**»»■••••§•*«■• * «•■><■»•.•*•*■■- iilliiiininfiiiiiiiiiiiiiiiii«iiiiiiiiiimHiiai(iifi«ii*n i«iiii«HucMiiiiM«<iiiiMi«uiii> lUllllllllillllllll ■■■»■■■■■>•■■■■■•■•■ ■■■■•«■■•■•••»•■»>•' •«f ••••■••••■*« -AM. 3 Ittirtlllll.tUtfl . IIUIIIIIIIIItlllllllllllilllMinil^KIUMUIKIIMIDfltMMUMMOt.l.^Mt.l.lllltlllWtllX- s < >«i>> ■>■*>>>• >•■>••»>..>•-■•■•*«•-•■•■•..••••••■■.■>•*•. a^.x ...,....««.«•.». .».» ...... ........ ... ■Mil**" iuiiiii >*»■>■••..• i «»«. * ««. I ■■■•«■■■•■ ■«■••■■■■■■■■■■*»■■■»■■■■■■■« «•■■■»■■■ IBS' ■■«••■•■ • ■ ■>■■■■»)■«■■«■ ■itiillilit aa*Ba ilimifiuiimii ■ ■■■ Hi _. JBBBMBB *■*■■■•* « a « i *•>«•■••«■■■•* ■■Miiiari ■* MlilllllliflMMIt ■ »*■ iffititiiiiaitimtiii •■iiaidiintKiiiiiii ■ ■•■■•■■»•■ i»iiiiijfi>iiiiiiiiiiimm«iiitiiiitt< I ■•■■■■«••» ' ■•■•■IIIIIIIIIiaillllUIIIIMIIllllllll itMiMtim (laiiMlffiliitKilikM) I _ JniMIMimilHIIUItlflimiltlMRRIIt'iiakitlflltMll.fflllMIHMIimif • mm ■ «f k . r ........ .) r>( • •■laiMaiMii'i j - v-j lltlNMIKKIMtiU ' Criteria for CcnaUtecoy and uniquenee* la R«lay circuit! [>} September ft, 1M1 Zb ft ayatea of linear algebraic equation*, thara ara tfcree poaaibla type* of de«eu*rnoy, n&aely lneonaiateaey (no poaaibla aolntioa), assblguity (solution* not uniquely determined) and redundancy (aura equation* than neeeeaarr) • Scoe**ary and auffioiont condition* ara known for the a* types of degeneracy in tcra* of the rank* of mm coefficient and augmented satrioea. Soaewfcat elailar af facta can occur in tna boolean equation* characterising relay oircuita, gir» ins riaa respectively to chattering aaoiguity of relay pool- tioa for certain value of the independent variable a, and reduad- ^UaVCJJ^ ^Je?^ HJ^avdsVj^JJ ^^e^? ^M9&aa t '^^^aV^jtfca^^ 3ha^fc ^*1b^*J**^J e^H^c* 1 ^*^ Jpas\J?^fce^ca^^n> ^H^^L^fc^Ht^^LJfc ^cTiij^^^a, W« aattM i aihmA fjM» thft«> m nnA I tlrtna Im t— mm f»f a a ilMKltt ae^a? ^s*es> ^*^acaa»>ea>*aaa^pa» *> wcT Waaler i*^i*» ^p^peiwn ek vavatv aa^ai w^ses, a* ^e^w^a w dlacrlainant 7. Consider a relay circuit containing •** relay a *X> «gf •••• Hake and break a oat cot a oa ^ are dealg- aated a A aad *J, and we auppoca that thara are a independent variable a 1 , e^, •»•, e^, which do not depend oa the relay poaitlona. 0uah a circuit la equivalent to the circuit of Fi*. 1 in which *i *B* **** *** *i» *#,• ••• la the Boolean function which la aero when the awitchee *»ft MitMti a^, ere la eucfc position* that the volt- «M wro» la the original circuit la *uf r icloot to oper- ete It ana oh otherwise. The fenetloa B i-x will be •till* the oirauit ai«cri*ta*nt. *e alee define the following it mm* a eteadr etate la a relay circuit corres- ponding to a given aat of veluee of the laaepeaaoat variables Ais a act of poaltloaa P.. ?«. JLrtao relaye oath that If tao iadepeodeat variabice ere given tao valuee A^, end tao ralaye held la tao position T t > ««»• P a lea* enough for tao eteadr atato fluxee la tao 00U0 to build *», the relays will remain la tao aaao poal- tloaa ladefinBtely, a oeapletelr •oolUatoay oteto at a relay elreult la a aot of valaoa Mg % A,, „ #f of the independent variables, each that ao natter what tao Initial yoaltloae of tao relays, or how long they are held la that position, ansa they ara re- leesed at least oao aakeo aa laflalto auaeer of eeoUlatloas, I.e. ehattare. Xa addition to theee obviously exclusive pocei- hUitles a alrealt nay be •partially* oscillatory for eertela Y*lu*i of th« loft«j>emaoftt rarioblos- with mm iaitUl oonCi tiooo th« •Ircuit oh&tt«r* and with otters roiftpooo ioto o •toot? ototo. Ao oxonpla U oho** im Figure a wtero with too ioltiol OOO&MOO a x • (o9»i»to4) tho oireuit «h*ttero while with tho oireuit rei&peee into tte eteefijr ototo • 1, Rg * 1 fttSBBI I • *°* *i§ *••* *£• * M t* »e o otooA/ ototo It is oeeeoeerjr eoft ouffloleot toot This lo aeoeoeejy eiooe lo o otoo^jr ototo too oeotooto of ■ ■ relay «1o41o#i or %•.%•» to toot o-ai^ol^-t «*eo • W v m t • A^ Xt la sufficient sines so tt*t if tii* relays are hsld is these positions ? A long enough fear fluxes to build up they will remain there* ■ Theorem II • For .... to be completely oscillatory it is necessary end sufficient that t C*^t a^i «^» •••• a^) • l identically la the This la accessary sines other- wiss there Is a sst of a^, say 9^ such that * * and this Is a steady stats by Theorsm X, It la sufficient alas* If true thsa with any starting position say 9 V •»*, F a at least one tern of ths sua (1) say * t • n^ la equal to one. aa that snd one or ths other ana to • hence, After sons relay has shangsa «a still boys ths sans aitaatloa sines f - 1 so that at lsaat one relay ashes aa infinite number af shannon of position* - 5 - la »tM f U i# A^t #♦♦» a^) is * function •f tfat (ait idontioalir ©at or n«ro) too oyste* h»» •om nt«aay »tata« aawoly tat roots of f « 0, Out for arbitrary starting conditions w* saenot toy what the notion will so, Khataer s elroalt eeefce out s steady state or sot depends set only on ths artwork topologr so la Fig, 2» oat •loo oa relay ehareoteristise as la Fig. 3. Bare If lo olow operating ana *j wy fast the « iron it oar chatter with both relays ialtieUy uaeps rated for a g nay new stay la long eaoasfe to opsrsto K^. If lo fast and Sg alow release* too systea rolapooo lata * x * 0, Rg • 1. Boaoo no purely slgsbrais oo editions saa So sot ap to deter- alao whether a olroait will rolapao lata a stood? otota whoa la a function of s^ t © ojk ^fts^ eiSKe^sKJo^SPf ! SvlIj 15, 1943 Gap? Ko Bel ON THE INTEGRATION OF TEE BALLISTIC EQUATIONS ON THE ABERDEEN ANALYZER by Professor W, Feller of ErovzD. University and Dp, 0» E» Shannon of the Bell Telephone Laboratories AMP REPORT NO. 28.1 APPLIED MATHEMATICS PANEL NATIONAL DEFENSE RESEARCE COMMITTEE This is a report on Investigations made at the request of Dp. Warren Weaver (letter of December 28, 1942). Our study has been based partly on oral information received in Aberdeen (January 18, 1942) and partly on the material contained in the Report No. 319 of the Ballistic Research Laboratory ("Report on the Differential Analyzer at Aberdeen Proving Ground" by Major A. A. Bennett, December 1942). The technical set-up as described in that report will in the sequel be referred to as "present set-up". It should be clearly understood that we were not to study possible technical improvements of the ana- lyzer as such nor to reexamine the theory underlying the dif- ferential equations. Accordingly, the present report is con- cerned only with an examination of the procedure of mechanical integration of the differential equations of ballistics as used at present. Furthermore, we have not considered any methods of integration other than on the differential analyzer. Before proceeding to describe devices which might contribute to the efficiency of the analyser we wish to summarize some negative findings, as these may render superfluous similar investigations by other persons. a) We have carefully investigated a great number of alternative set-ups, on the differential analyzer, of the dif- ferential equations either in their present form or using various new variables. However, we have been unable to find any form superior to the method as used at present in Aberdeen which, in our opinion, is the most efficient one. b) We have studied the advisability of using some method of successive approximations. Such methods naturally present themselves since one should expect them to reduce the ranges of the variables involved and thus increase the accuracy o However, a closer study will show that it is almost invariably necessary to subtract, on the analyzer, two large quantities which are themselves independently obtained on the analyser. This, of course, nullifies the desired effect of reducing the ranges. Various possibilities have been studied and, among fchesn, the possibility of starting with the vacuum trajectories and integrating the difference between them and the actual trajectories. Again we were unable to find a method which would aopear superior to the present set-up. It will be noted, however, that the modification of the latter suggested below, can in some sense be interpreted as the first step in method of successive approximations. c) Several perturbation methods and expansions according to various parameters have been tried paying special attention to methods suggested in the newest Russian literature . None of these methods seem appropriate for the analyzer « Coming to the less negative part of this report we remark that an adequate theory of errors of the differential analyzer is not available at present. However, simple theoretical considerations based on experience gathered at M.I.T. make it appear that a very considerable part of the total error is due iEITIDOTEl of error are backlash and,, perhaps even bo?®, inaccuracies in the following meehenism for- the input and vector tables . It ssems therefore possible to achieve a gain in accuracy by P®« dueing the range o£' the variable?? in the integrators, even though this nay neeossitat© the introduction of new adders and gears. $hs following r ecomsaendat ions are based on this assusaptiO'At We proceed* step by step starting with the simplest case. Recomend&tions , 1) Consider, to begin with, the horizontal displace- to s sent 2:. Obviously dx/dt will range from its maximum r, at the beginning to seine fraction of it, say qx Q , at the end* Accordingly, when integrating in the usual form (1) X * X dt the integrand ranges from qz c to x Q , Now this means that only a fraction 1 " - 3 — of the total range of the integrator disc is used even if we suppose that the goale factor has been chosen in the best way (30 that the rim of the integrator disc is used for values of x near x ). If, instead, we 14J_ i f * 1 . <l (2) x - — g r xot « j( z . i-| a^Jdt , 1 — Q " the Integrand will range from its ma x imum — *o t0 lta minimum - 1 - a i 2 o This allows one to use a scale factor ■s r times as large as in the set-up (1) and to utilize 1 - q the entire integrator disc. This, of course, means a consider- able gain. Eow the constant i ± q in the integral in (2) appears only as an Initial displacement. It is therefore seen that the realization of the proposed set-up (2) requires, as compared with the customary set-up (l), an additional gear (to produce 1 t q aLt ) and an adder. The following figure shows the simplest mechanization. >\ s x 14-Q . x - 2 x t t t It goes without saying that the gear ratio does not need to be exactly I. +. .3 4 2 x Q • any number near the middle of the range of the integrand will do the same services • If used to its fullest extent, the system as described changes a previously positive variable into one taking on also negative values. Although only one change of sign is introduced this will introduce some new backlash* Now, if instead of (2) we mechanize (S) x - qx.t qx Q ) dt, T -5~ the new integrand does not change sign, and no new backlash is introduced. On the other hand, the optimum scale factor for (3) is only — times that for (l), that is to say half the 1 - q. scale factor for (2). We conclude that with proper corrections for backlash the set-up (2) should prove besto However, if enough frontlash units are not available at Aberdeen, the set- up (3) may be tried with advantage. 2) A similar device can obviously be used wherever the range of the integrand does not utilize the integrator disc to its fullest extent* This is true for almost all integrators whose outputs are: (i) the horizontal displacement x, (ii) s = fv dt , v being the speed, (iii) Q" hj , where y is the height* In the first two cases the new set-up would not produce any additional loading since the integrators are driven by the independent variable-motor. In other cases an additional loading would ensue which may have to be compensated by the uae of a larger scale factor on the t-shaft; this would in- directly slow down the machine. Whether this will have to be done is impossible to predict theoretically. Should it prove necessary, it would be for the user to decide whether the gain in accuracy is worth the loss in speed. 3) If the above described device should prove in- - V/ -- * - v j ?'ar e &i#£iuZ£ fit cbs atpens* or f ©Hewing uspr-c-vftmca? &*t oonaidaraMa Eaaua] #J>rk end io&s Tn4 process of integration may bis Stopped it ecn^aivfsat wnd tlx* dure 4-5 cie:-- <jr 'be:: ?abr»vs! fe« <'* TX ' f ' intervals? C-ofttfSSeifi. f'^r wxrole •. «c? 5.afcet*iaa4 ! febi fs*« indicated ite the figure *' rath as ex» si X \ V Her'?, even the usual pros a dure of Integration utilises the entire range of the integrator disc and no gain can be achieved by Means of the device as described above ► Ee^ever £ , the integrand any conveniently be treated by a double application of this device splitting the interval of integration into two parts » In othsi words, insteed of e given function fix) we integrate the difi eranee betveen fix) and a step-function. The output of she integrator is ~,o longer P'x) * j bufc th * difference be ere en »' x) end e triangular (or "roof*-; funesisn. fU) r~ — V- V i — s„: 7- Similarly, with a convenient subdivision we may use any step- function for the integrand and the corresponding polygonal line for the integral. This procedure obviously requires resetting the integrator in question and changing one gear ratio each time the machine is stopped. On the other hand, the increase of the scale factor is roughly proportional to the number of subintervals, 4) In principle this procedure may be looked upon as a special case of the following more general method. Instead of (4) v(x) = Jj dx write (5) w(x) + 0U) = \(y + $*) dx, where 0(x) is an arbitrary function and 0Hx) its derivative. In practice, of course, 0(x) should be chosen so as to render the maximum of Jy + 0'\ as small as possible in order to in- crease the scale factor on the integrator. Now if 0(x) is not a linear function, the mechanization of (5) would require two new input tables or their equivalent. However, the possi- bility of obtaining some special 0(x) by means of non-circular gears should not be overlooked. This would mean a considerable RESTRICTED -8- improvement of the linear method. 5) We have been asked by Dp. Dederick to consider whether it would be advantageous to generate from an input table (instead of by integration, as at present). The foregoing remarks contain an answer to this question. It is not difficult to s ee that the present method of obtaining the function by integration is more efficients It would probably become even more so if the recommendation 2) were put into effect. 6) Although it is in no direct connection with the subject of this report, we enclose an Appendix describing a simplified method for computing gear ratios. This method is based on previous experience (of one of us) at M.I.T. and may prove useful in connection v/ith ballistic work on the Aberdeen Analyser . Brown University, Providence, R.I. and Bell Telephone Laboratories, N.Y. May 27, 1943. W. Feller C.E. Shannon iEOTIOT -9- A METHOD OF DETERMINING GEAR RATIOS • In this appendix a simplified method of determining gear ratios for an analyzer set up will be described which was used for some time on the K.I.T. analyzer and proved in general to be considerably faster and easier to change than the original method of equalities and inequalities. The method may be briefly outlined as follows: 1. Draw the set up with an unknown gear ratio in each shaft of limited displacement. An unspecified ratio is also placed in the two inputs of each adder. 2. Calculate an approximate scale factor on the independent variable to give the expected time of solution at the average rate at which it turns. Choose an exact scale factor near this approximate one which is a "round figure" in terms of obtain- able gear ratios - i,e., factorable into a small number of simple rationale. 3. Choose in the same way scale factors for all shafts of limited displacement - integrator inputs and function table inputs, and outputs - so as not to exceed their limits with expected displacements. 4. This fixes p by division, and from the integrating factor of the integrators, the scale factors and gear ratios of all shafts except those containing adders. In the case of adders the input shaft with smallest scale factor fixes the scale factor of the adder, the other input being geared down to the same scale factor. The output gear in the adder is then fixed* 5. The set up is then inspected to see that no integrators or other parts are too heavily loadedo If they are, reduction gears are transferred from inputs to outputs to reduce loads when possible, otherwise the soale factor on the independent variable is increased. In case the ratios come out too complicated dif- ferent scale factors are chosen in Step 3. With a little practice and foresight, however, it is possible to obtain suitable ratios on the first trial. KTTOTEO DO Two Hew Circuits for Alternate Pulse Counting The well known W-Z relay circuit is shown in Fig. 1. A is a pulsing contact which is alternately opened and closed. Indicating closure of contacts by and open- ness toy 1 and for relays for operated (up) and 1 for unoperated (down) the circuit goes through the following periodic cycle of operation: A w z 1 1 1 1 1 • 1 1 1 1 Thus one complete cycle requires two complete pulses on A. This note describes two apparently new circuits which perform the same function. These are shown in Fig. 2 and Fig. 3. The operating cycles for these are: Fig. 2 Fig. 3 A w z A f z 1 1 1 1 1 1 1 1 1 1 1 1 These three circuits may be compared with regard to the number of elements required as follows: Belays Contacts Resistances Figure 12 1 continuity, 1 transfer 2 Figure 2 2 2 continuity, 1 break 1 Figure 3 2 2 transfer, 1 make 1 In Fig. 3 the resistance is theoretically superfluous; if the transfer elements could be trusted never to be shorted it could be omitted, but in practice would be necessary to avoid shorts when the relays were being adjusted. Figs. 2 and 3 are essentially duals, and 3 was obtained from 2 by the duality theorem. In Fig. 2 it may be noted that the two relays are *ip-when A is closed, while in the standard circuit they are both ^jTwhen A is open. This might be desirable in some applications. Fig. 3 has the possible disadvantage that both ends of the pulsing contact A are connected into the circuit, while in 1 and 2 one end can be grounded. C. £. SHANNON Att. . 1, 2, 3 w CONT. 6 W' o — O G « W T A/W z 1 AAV w CONT Q Z W' -o o CONT —O O— I z AAV w w -0 3 1 W-1 TRANS. Z TRANS. — ty\A/ — " FIG. 3 -o o z -o o — * A -o o Z' FIG. 1 FIG. 2 tTtlT SCALE mm within uriimilti. int.. ifTrnr Counting Vp or ixmn vith -ulse counters w J 1 iith binary counter* of either relay or *l»c5rsnic type i* is ;o£sit2« by simple KKsdif icutisn u> count bo ih up end doon. £uppose Us* largest uuaber that oaa be j w^isterec is L* refining the ao^lisent of «aiy »unh»r * & fey t-a * «' *e sots that subtracting * nutther » rrsJi S is s^ulvileai ta adSin* w its eoapllsjsnt ftt«i • Mf*He • thus If in 6 binary oouatsr ** t&tis the soapllosat o/ « reading ^hioa s»&as locking up Uis ;*ul*y urieft ttrt dSKja and #4ee-vei lu the oa^, aid putting out the tubas vfcioU fire ot&guetiag unfi vie iu Ute electronic auoe) and then let the counts* eo&tlnue add tits dumber of pulses in rjuertion, and finally t^ice the aa^lifitaat, &^uin, we a&ve au&trseted the nuabsr. ^etually hm**v»r, this -raoees onn be done si&ply by trcuef orric^ the carryover le&as t» the opposite digit ( tube or rtl«y). ic the reity esse this sjoouats t*» a transfer Qcm toot *e«*c*n each adjnsent pair of digit*, a&e an additional safes oostoot* in the eleutrouio oaft* the carryover lease go froa the " tAtar tube plut* to triiis on the next sts^a. Here *e eoul4 insert «n alcetroale transfer oontaat, *» s^wt, for exsnplo in Figure 1. jthen *c wish to add, the ©©asson eon troi leads far "edd is given sutoff voltage, the -subtract" lead a large negative vol- tage. A positive lapulee on the "one plate of a state then cause* one side of the double triade to c endue t giving % negative impulse to the next g7id» far a enTryvwr • f er subtrfcctioo the voltages on the soatrol leads ars revexfcod atid carryover ooours when the "aero" plate volte, • inore&ses i.e., when this tube goes out* 0« £. &£*KjfCX C-»f A (9-4*) Cover Sheet for Technical Memoranda Research Department subject: clrcuitg for a PiC>M> Transmitter and Receiver - Case 20878 ROUTING: " S.A.S.,H.W.B., H.F. 2 -- CASE FILES * G.W.Gilman 5 -H.W.Bode s A. G. Jensen -> W.M.Goodall 8 E.Peterson 9 H.SoBlack 10 -W.F.Simpson - Patent Dept. 11- J. H.Pierce 12- R.L.Dietzold 13- £.B Zeldman t$55$£^L 14- W.T.Wintringham 15- F.B.Llewellyn 16- C.H.Elmendorf 17- B. M.Oliver 1 8- C.E. Shannon MM 44-110-37 DATE June 1, 1944 author s c.E.Shannon and B.M.Oliver ABSTRACT Circuits are described for a P. CM. transmitter and receiver. The transmitter operates on the principle of counting in the binary system the number of quanta of charge required to nullify the sampled, voltage. i MISSION OR TKt RTVELATION or I , C^rt^ Ciroults for a P. CM. Transmitter and Receiver - Case 20878 MM-44-110-3 June 1, 1944 MEMORANDUM FOR FILE The circuits shown in the present memorandum are intended to fill the boxes of the block functional designs for a PCM transmitter and receiver shown in Fig. 6 of a December 1943 lueworandum (MM-43 -110-43) . The transmitter functional diagram is shown here as Fig. 1 and the general operation is as follows. The incoming signal is sampled periodically by closing the electronic switch 1 with periodic impulses from the timer. This charges condenser C to the sampled voltage and the electronic switch opens after each impulse isolating the condenser from the signal. The existence of a voltage across the condenser causes the comparator to olose electronic switch 2 which allows pulses of charge to feed into the condenser from the pulse generator, discharging the condenser. The number of these pulses is counted in the binary system by the binary counter and when the condenser is reduced to a reference voltage, the comparator opens elec- tronic switch 2. Near the end of the sampling period the binary counter is connected to the distributer which registers the binary number counted, and the counter is then reset to zero; both of these operations controlled by impulses from the timer. The distributer then sends a series of pulses or not down the output line according as the binary digits are 1 or 0. These digits are sent in reverse order, the least important being sent first, to tie in with the contemplated receiver circuit. The specific circuits are shown in Figs. 2 to 8, and detailed descriptions of their operation follow. Fig. 2 shows the electronic switch 1 which charges the condenser C to the signal voltage at the sampling times. The signal wave is biased up so that its minimum value is slightly positive, and impressed on terminal 1 as a voltage; i.e, the signal source as seen from terminal 1 is assumed to be of low impedance. The timer, at the sampling time puts a positive pulse on terminal 2, which is inverted by the triode to give a negative pulse on the pentode control grid. This causes the pentode which was previously conducting to cut off. Before the pulse condenser C had a small minimum positive charge and neither diode was conducting since the plates were held at a low positive potential by the pentode current. As the THIS DOCUMENT CONTAINS INFORMATION AFFECTING THE NATIONAL DEFENSE OF THE UNITED STATES WITHIN TH~ MEAN- ING OF THE ESPIONAGE ACT. SO U. S. C. Jl AND 12. ITS TRANS- MISSION OR THE REVELATION OF ITS CONTENTS IN ANY MANNER . TO AN UNAUTHORIZED PERSON IS PROHIBITED BY LAW. pentode cuts off, the diode plates swing positive and the right hand diode starts to conduct charging the condenser. As this condenser voltage builds up exponentially the voltage on the diode plates also increases positively until it reaohes the signal voltage and at that instant the left hand diode starts to oonduct. The voltage stops rising at this point since the plates are now essentially short circuited to the low impedance signal source. This all occurs during the timing pulse, and at the end of this pulse the pentode again starts oonduoting dropping the diode plates to a small positive voltage, less than the minimum signal voltage, and isolating the condenser* Fig. 3 shows a standard multi-vibrator circuit for giving a series of square pulses. The coil condenser cross connection of plates to grids causes the grid transient to be a cosine curve which crosses the cut off grid voltage at a time determined essentially by the LC product and independent of amplitude changes due to variations in plate supply, etc. As this point determines the period of oscillation, the oscillator has good frequency stability. The output appears on terminal 6 as a square wave. Fig. 4 is the comparator, which is actually only a differential amplifier with sufficient gain so that the granularity voltage applied to the input is capable of driving the amplifier from saturation in one direction to saturation in the other. The input is the voltage on condenser C which immediately after a sampling instant, will be at the sampled signal voltage. This voltage starts decreasing by steps as the condenser is discharged and when the condenser voltage applied to terminal 3 moves down the step which crosses the differential amplifier threshold, the amplifier swings from saturation with output terminal 5 at nearly zero voltage to a high negative voltage. The electronic switch 2 is shown in Fig. 5. This circuit sends units of charge into the condenser through terminal 3 under the control of the comparator output coming in on terminal 5. The multi-vibrator output is connected to terminal 6 and the output of the multi-grid tube will be a square wave when 5 is positive, which ceases when the comparator swings to the other saturation point driving the voltage on 5 in the negative direction. The double diode connection gives a pump action. When the plate voltage of the multi-grid tube increases to the upper part of the square wave, the charge flows into the condenser from terminal 4 through the left diode. During the lower part of this wave - 3 - the oondenser discharges through the right diode out into the condenser C, via terminal 3. As this causes the potential of 3 to decrease gradually down a step function, it is necessary for the input voltage at 4 to decrease similarly; otherwise the difference in voltage between 3 and 4 would cause the size of quanta to decrease gradually. This lowering of the voltage on 4 is accomplished by a cathode follower arrangement on the first cathodes in the comparator, which follow the step voltage down. The binary counter is shown in Fig. 6. The descending step voltage which appears on condenser C is applied to the input of this circuit through terminal 3. The input resistance condenser combination serves as a differentiating circuit (the time constant fairly small compared to the time between steps) so that the voltage applied to the first grid of the double triode consists of a series of negative spikes. The double triode is simply a two stage resistance coupled amplifier, and its output feeds the binary counter digit tubes. This circuit is of standard type with two pentodes in each stage and there are two stable points for each stage, one with the upper tube cut off and the lower tube conducting, and the other, the con- verse situation. A negative impulse from a preceding stage applied through the coupling condensers changes the state from the previous stable condition to the opposite one. This impulse is applied symmetrically to both suppressors, but the condenser across the cathode resistances, charged in one direction from the previous state, biases the choice of the next state toward the opposite one. The control grids of the "zero" tubes (the upper row which are conducting when the corresponding binary digits are zero) are connected to a common control lead which is used to reset the reading to zero after the reading is reg- istered by the distributor. This is accomplished by a neg- ative impulse from the timer. The outputs to the distributer are taken off the plates of the "unit" tubes. The distributer is shown in Pig. 7. After the number of quanta of charge has been counted in the binary counter, the leads 11, 12, 13, 14, 15 will have either low positive voltages or B+, according as the corresponding digit is one or zero. The grids of the left triode, will then be either negative or positive from the potentiometer action to the negative voltage C-. To register the counter reading, a positive pulse from the timer is applied to the control grid of the common pentode allowing it to conduct and pulling the cathode of the left triode and the diode in all stages negatively. If a digit is zero, the potential of the cathodes in that stage stops at a positive value due to current through the triode and the diode does not conduct. If the digit is one the cathodes are pulled negative and the corresponding oondenser C ia discharged through the diode and pentode. At the end of the registering pulse, the cathodes go positive again, isolating each C , with the digit registered as presence or absence of charge. The reading is taken off the (/— series of condensers C Q in sequence by positive pulses from the timer on leads 21, 22, 23, 24, 25. These pulses allow the right hand triodes to conduct and each Cq in turn to oharge through the output lead, leaving them in the normal state (at a voltage about equal to the pulse voltage). If the digit is "zero" no oharge of C Q from the output lead occurs. Thus negative pulses appear on the output when and only when the registered digits are one. The timer system is shown in Fig. 8. An oscillator which may be synchronized subharmonically with the pulse generating multi-vibrator, operates at the sampling frequency. This passes through the clipper amplifier to give a square wave, which is differentiated to give alternating positive and negative spikes. A second clipper amplifier eliminates the negative spikes and makes the positive ones rectangular. These short rectangular pulses are fed into a delay line terminated in its characteristic impedance. The timing pulses needed for the various circuit functions are tapped off at the appropriate places as indicated. A synchronizing pulse may also be taken off the same delay line. Fig. 9 shows the receiver circuit. The signal passes through the clipping amplifier which is adjusted to give a saturation voltage on the output if a pulse is present and none if absent. This output is applied to the grid of a multigrid pentode, whose other control grid is given positive gating pulses at the center of the digit intervals. These gating pulses allow the pentode to conduct if a pulse is present and the plate current is then independent of the plate voltage (providing this stays within certain limits) so that if a pulse is present, a fixed amount of charge (equal to the length of the gate times the pentode current) flows onto the condenser. The time constant of the R C system (including the pentode load resistance) is adjusted to allow the voltage to restore itself halfway toward the equilibrium value in the time from one digit to the next, so that after all pulses have been oollected on the condenser, the charge contributions of the first, second, third etc. have decayed by factors of 2^' i 2 "' 1# At this tlme a positive gating pulse is put ( r on the grid of the second pentode, allowing the condenser to discharge rapidly into the low pass filter. The timer system can be realized with the systems shown in either Fig. 10 or Fig. 11. C. 2. SHANNON B. M. OLIYZR Att. Figs. 1 to 11 s .-. \ Si F/G -J ! • D-0 IuIjw sn*pe to fclnlaine Bend sidtn fcitn Munprerlar^iD* 7-uloea *e ooaslder tbe problem of » taping pule** #{t) enlen ere aero outside -fc, U in ouen * wey an to nlalml*» tbe UtmA nldtn of tbe power opeetrua of t&e ennenble of funotioas fors»4 by aeadiiis s eeq*eaee of tne fuaetlean *{t) end 0, witb epeeia* or £i t tne probabilitiee of eltber b*i»£ 1/2. suoh eneesiblee of fun art iocs. Theorem: i*t an ensemble of function* bo defined by n« -~ enere tbe o^ ere enoeen iadopaaciintly end ore equally likely to bo one or s«ro. toe power epwetro* of f{t) ti*tn eomnleto of two parte, e point epeetrom eonsl*tia& of too epeetrw* of %X * (t*ftam), i.e. tne spectrum of o(t) repented, end o eontin- uvmm pert eoneintln* of tne ottor^y opoetrm of ♦(*) « f irst « theorem will bo prored on tne epestrtsa of Consider too estooorreletlom of f(t) 4{ki - U» |f J *f <*> f(t»k) dt Y^OO _-r • U» A /*£ e{t***n) £ n* o(t**»m»>>} dt I** integrand oen bo written ^ a % a* a(t*a*a) »{t**««00 * j} •* a(t*t*a**J 4 •£ fit-in) oftt* a«»*vJ >Uaa «• eraraga , Hit aua of tfca first two parta givaa Urn suto- correlation of ti* f aaatiaa J £ a* aiaaa tka ooaffiaiaata a* a a (a^a) feara saa oaanea ia four of aalag toots a$aal to eaa, aaa ia tat aaaoaa t«r* *jS aaa taa aaa* ataa vaiaa. Ttoo iaat tana la taa liait reausao to fit) f|I V) at • a by *? aoapaaaatoa for taa attoaar of taras. Taaao two parts (in taa saaarata aaa aaatiaaoaa porta of taa apaetnaa, taa first tolas taa aataoorrslatioa af a(t) raaaataa aaa taa aaaoaa tivlog taa saargy apoatram af a(t) la oaao »(t) • oatalao -u, £, taa aaaarata part aaa poaor at o - ft, 1, t , S, .... aaoeatia* to f (t) - ^ ♦ r a m aaa at ♦ I. » a aia at. Sap^oM w *i*0 to Ofaopo o{t) ljrla« »iti»io -L, I is •at* • »oj os to alolalso to* bood oprood of too upectrua &* ooooorod ojr « - Jo* *(o) do. Tbo oantriOutiooo of too two parts of too spectra eon oo odd**, and toot fro* tfc* dooorot* port Is Tor too continuous port udo& too toooroo t&et too j» £ F*(« ) da - jt^ltJJ* dt wb*re ffo) ood fim) aro fourUr traoof rao «o Hovo *t • f*U) f - £ ten 1 • h** * a ♦ **a* * «*♦...! l.o* , tto mm oo too desoroto sootrlootioo. To* tatal a i» therefor* To mioiodse * «ltO o flood total eoersjr per poise oed with ooosdoxy ooodltiooo •(£) - - wo vast ooTiooolj plooo oil too eoergjr la too first tere, o oooloo oorto displaced to oo tensest to too tUM) oxio. «■»* A « fit) Cover sheet for technical memoranda Research Department subject: A Mathematical Theory of Cryptography - Case E0878 ( ^0 \ ROUTING: i _ HTfffl-HF-Case Files 2 - CASE files 3 — T V » 4 - T 5 H. 3. Black 6 - F. B. Llewellyn 7 - H. Nyquist 8 - B. tf» Oliver 9 - R. E, Potter io - C. B. H. Feldrian 11 - R. C. Kathes 12 - R. V. L. Hartley 13 - J. R. Pierce 14 - H. W. Bode 15 - R. L. Dietzold o 16 - L. A. MacCall 17 - W. A. Shewhart J.8 - S. A. Schelkunoff 19 - c. E. Shannon 20 - Dept. 1000 Files mm— 45-110-92 date September 1, 1945 author C. E. Shannon INDEX no. P 0#4 Dos mi saui ABSTRACT A mathematical theory of secrecy systems is developed. Three main problems are considered. (1) A logical formulation of the problem and a study of the mathematical structure of secrecy systems. (2) The problem of "theoretical secrecy," i.e., can a system be solvod givon unlimited time and how much material must be intercepted to obtain a uniquo solution to cryptograms. A sccrocy measure called tho "equivocation" is defined and its properties developed, (3) The problem of "practical socrocy." How can systems bo made difficult to solve, ovon though a solution is theoretically POS8lbl0t ' • ' THIS OOCUKEHT CO^S^-or^ 5g STATES ^^fK ^ LAWS, TIU.E I? RCVEX**** 1 OF «J* CONTENTS »N AN. »N,lth TV PERSON IS PROHIWTEO BY IA«. A Mathematical Theory of Cryptography - Case 20878 ((4) MM-45-110-92 September 1, 1945 Index P0.4 Introduction and Summary • BOD WR 5200.10 In the present paper a mathematical theory of . . • cryptography and secrecy systems Is developed*. The entire approach is on a theoretical level and is intended to spmple* : ment the treatment found In standard works on cryptography, * . • , - V • There, a detailed study Is made of the many standard types of-^ : - • codes and ciphers, and of the ways of breaking tjiea*. We will be more concerned with the general mathematical structure, and properties of secrecy systems, •: . .-' The presentation is mathematical in character. Wo first dofino the pertinent terms abstractly and then develop our results as lcnrias and theorems. Proofs which do not con- tribute to an understanding of the theorems have been placed in the appendix. The mathematics required is drawn chiefly from probability theory and from abstract algebra. The reader is assumed to have some familiarity with these two fields. A knowledge of the elements of cryptography will also be help- ful although not required. The treatment is limited in certain ways. First, thero are two general typos of secrecy system; (x) conceal- * ment systems, including such methods as invisible ink, con- cealing a message in an .innocent text, or in a fake covering cryptogram, or other methods in which the existence; of the . - message is concealed from the enemy; (2), "true" seorocy systems . where the moaning of the message is concealed by ciphofr, code, etc., although "its existence is not hidden. We oonsider_ only V the second type--oonoealment systems are more of a psychological than a mathematical problem. Secondly, tho treatment Is limited v to the case of discrete information,, whore tho information to bo enciphered consists of a sequence of discrete symbols, each - chosen from a finite set. These symbols may be letters in a *Soo, for example, H.F.Gaines, "Elementary Cry^tana^ 1 J ( s^o R MAT.oN w«g or M. Glvierge, "Cours do Cryptographic. ft; 5 TME katonm- oi^ w ^Vvonage * " person is p*«oH»an«> a* - 2 - language, words of a language, amplitude levels of a "quantized" speech or video signal, etc., but the main emphasis and think- ing has beon concerned with the case of letters. A preliminary- survey indicates that the methods and analysis can be general- ized to study continuous cases, and to take into account the special characteristics of speech secrecy systems. The paper is divided into three parts. The main re- sults of these sections will now be briefly summarized. Tho first part deals with tho basic mathematical structure of language and of secrooy systems, A language is considered for cryptographic purposes to bo a stochastic process which pro- duces a discrote sexjuonco of symbols in accordance with some systems of probabilities. Associated with a language there is a certain parameter D which wo call tho redundancy of the language, D measures, in a sense, how much a text in tho language can be reduced In longth without losing any informa- tion. . As a simple example, if each word in a ■t'efcfc' ip repeated a reduction of 50 'per cent is immediately poesi*lcV .further 4 : : reductions may be possible due to tho statistical structure of * tho language, the high frequencies of cortaih lottersor v words, r etc. The redundancy is of considerable importcjido ' ;in; the ' study ' of secrecy systems. , ' /; ' A secrecy system is defined abstractly as a sot of transformations of one space (the sot of possible messages) into a socond space (the sot of possible cryptograms). Each transformation of the set corresponds to enciphering with a particular key and the transf omations are supposed reversible (non-singular) so that unique deciphering is possible when the key is known. Each key and therefore each transformation is assumed to have an a priori probability associated with it— the proba- bility of cEoosing that key, Tho set of messages or message space is also assumed to have a priori probabilities for tho various messages, . i.e., to be a probability c^ measiire space. f ■ In the usual cases the "messages" oonsist of sequences of "letters.". In this oase as noted above the ©essage space is represented by a stochastio process which generates sequences of letters according to some probability structural ■. ~ : - :< p .' • , • v ' ' '*•:..- •'. - '•• . " • . , ! .' -v • ,; ">." These probabilities for various keys and messages^ are^ actually the enemy, crypt analyst's a priori probabilities for / the choices in question, and represent his. a j>rl6rf knowledge" of the situation* Touse tho system a key is first selected and sent to tho receiving point. The choice of 6,&©y determines a particular transformation in tho set forming the^sys torn. Then a message Is selected and tho particular transformation applied to this message to produce a oryptogram. This cryptogram is - 3 - •HlffflSHflAL transmitted to the receiving point by a channel that may be intercepted by the enemy. At the receiving end the inverse of the particular transformation is applied to tho cryptogram to recovor tho original message. If the enemy intercepts tho cryptogram he can calcu- late from it the a posteriori probabilities of the various possible messages and keys which might have produced this * cryptogram. This set of a posteriori probabilities constitute his knowledge of the key and moss ago after the interception.* The calculation of these a posteriori probabilities is the generalized problem of cryptanalysis • ' ~ ."" " ; \ i * As an example of these notions, in a, simple substi- tution cipher with random key there arc 261 transformations, corresponding to the 261 ways we can substitute for 26 dif- ferent letters.' These are all equally, likely and each there- fore has an a priori probability l/B&Wz it this is applied to "normal English" the cryptanalyst being assumed to have no knowledge of tho message source o^hoc than,, that- it is English, tho a priori probabilities of various m&jBsageak Gf N lectors' .ore merely their frequency in normal JSngiish iext* ~ If the enemy intercepts N letters of cryptogram in this system his probabilities chango. If N is large enough (say 50 letters) there is usually a single message of a poster probability nearly unity, while all others have a total proba- bility nearly zero. Thus there is an essentially unique "solv tion" to the cryptogram. For K smaller (say N « 15) there wil usually be many messages and keys of comparable probability, with no single one nearly unity. In this case there are multi "solutions" to the cryptogram. , , - Considering a secrecy system to be a set of trans- formations of one space into another with definite probability associated with each transformation, there are two natural coe binlng operations v/hi oh produce a third system from two givon systems. The first combining operation. Is called the product operation and corresponds to enciphering the message with the first system R and enciphering tho resulting cryptogram with system S, the keys for R and 3 being .chosen. ; independently. This total operation is > secrecy sjrstcte "whose transformations consist of all the products (in tho Jusual , sons© of products of transformations) of transformations ia $ with transformations in R. The probabilities arc 'the prodticts of the" probabilities for tho two transformations. . . 3. J§E .:\ T- The sooond combining operation is "weighted addition »> J T- - T - pR 4 qS . J . p * q «- 1- *"Khowlodgo" is thus identified with 'a set of propositions hav associated probabilities. We are liero' at variance with the doctrine often .is sumo d in philosophical studies which conside knowledge to be a set of propositions which are either true o fslso. . f ■ :. v. 4 t It corresponds to making a preliminary choice as to whether system R or S is to be -used with probabilities p and q, respec- tively. When this is done R or S is used as originally defined. It is shown that secrecy systems with these twn com- bining operations form essentially a "linear associative algebra with a unit element, an algebraic variety that has been exten- sively studied by mathematicians. Some of the properties of this algebra are developed. Among the many possible secrecy systems there is one type with many special properties. This type we oall a "pure" system. A system is pure if for any three transformations T, . T. t T k in the set the product 1 T iVV . is also a transformation in the set, and all keys are equally likely. That is enciphering, deciphering, and enciphering with any throe keys must be equivalent to enciphering with some key. With a pure cipher it is shown that all keys are essentially equivalent—they all lead to the same set of a posteriori probabilities. Furthermore, when a given cryptogram is intercepted there is a set of messages that might have pro- duced this cryptogram (a "residue class"/ and the a posteriori probabilities of messages in this class ore proportional to the a priori probabilities. All the information the enemy has ob- trinod by intercepting the cryptogram is a specification of the residue class. Many of the common ciphers are pure systoms, including simple substitution with random key. In this case the residue class consists of all messages with the same pattern of letter repetitions as the intercepted cryptogram, Two systems R and S are defined to be "similar" if there exists a fixed transformation A with an inverse, A" 1 such that ' . R « AS . , ~ ■ * ' J If R and S are similar, a one-to-one correspondence between the resulting cryptograms can be set "up leading to the same a poste- riori probabilities. The two systoms are cryptnnalyticaTly the samo , v . » . The second main part of tho paper deals with tho prob- lem of "thooretical security." How secure is a system again: cryptanalysis when the enemy has unlimited time and manpower available for tho analysis or intercepted cryptograms? "Perfect Secrecy* is defined by requiring of a system that after a cryptogram is intercepted by the enemy the a pos- teriori probabilities of this cryptogram representing various messages be identically the same as the a priori probabilities of the same messages before the interception. It is shown that perfect secrecy is possible but requires, if the number of messages is finite, the same number of possible keys--if the messago is thought of as being constantly generated at a given "rate" R, (to be defined later), key must be generated' at the same or a greater rate* If a secrecy system "with a finite key is used, and N letters of cryptogram intercepted, there will be, for the enemy, a certain set of messages with certain- probabilities, that this cryptogram could represent. As N Increases the field usually . narrows down until eventually there is a unique "solution'*: to the cryptogram — one message with probability essentially unity : while all othors are practically zero. A quantity OJN) is de- >' . : \ fined, called the equivocation, which measure^ lii n statistical v way how near the' average cryptogram of H letters is to a unique solution; that is, how uncertain the enemy, is of the original; - - message after intercepting a cryptogram of N letters. Various properties of the equivocation. are deduced — for example, the equivocation of the key never incroasos with increasing N. This quantity Q ia s theoretical secrecy index — theoretical In that it allows the enemy unlimited time to analyse the cryptogram The function Q(N) for a certain idealized type of cipher called the random cipher is determined. With certain corrections this function can be applied to many cases of practi- cal interest. This gives a way of calculating approximately how much intercepted material is required to obtain a solution to a secrecy system. It appears from this analysis that with ordinary languages and the usual types of ciphers (not codes) this "unicity distance" is approximately |K|/D. Here |K| is a number measuring the "size" of the key space. : If. all keys are a priori oqually likely |K| is the logarithm of the number of possible keys. D is the redundancy of the language and measures the excess information content of tho language. In simple sub- stitution with random key on English |K| isltW) 261 or about , / . £0 and D is about .7 for English. ■ Thus unicity occurs at about .. 30 letters. _ *' ' . _ >. ; J;V^a' V '' V Y. ' It is possible to" oonstruct secrecy . systems with a finite key for certain ""languages" in which the function ft(N) does not approach zero as N «©» - In this case, no natter how . much material is intercepted, the enemy still does not got a., — unique solution to the cipher but is left with many alterna- tives, all of reasonable probability. Such systems we call ideal systems. It is possible in any language to approximate such behavior — i.e.., to make the approach to zero of Q(N) recede - 6 - ifcyiii'lUJJJ'llAL out to arbitrarily large N. However, such systems have a number of drawbacks, such as complexity and sensitivity to errors in transmission of the cryptogram. The third part of the paper is concerned with "prac- tical secrecy." Two systems with the same key size may both be uniquely solvable when N letters have been intercepted, but differ greatly in the amount of labor required to effect this solution. An analysis of the basic weaknesses of secrecy sys- tems is made. This leads to methods for constructing systems which will require a large amount of work to solve* A certain incompat ability among the various desirable qualities of secrecy systems is discussed, \ - PART I FOUNDATIONS AND ALGEBRAIC STRUCTURE OF SECRECY SYSTEMS 1. Choice, Infornatlon and Uncertainty Suppose we have a set of possible events whose proba- bilities of occurrence are p,, p g , ... , p_. Those probabilities are known, but that is all we know concerning which event will occur. Can we define a quantity which will measure in some sense how ^uncertain" we are of tho outcome? How much "choice" is involved in the selection of the event by the chance element . that operates with those probabilities? We propose as a numer- ical measure of this rather vague notion the quantity . ,n " : . ' :' . H « - Z p A log p A * » There are many reasons for this particular formula. Quantities of this kind appear continually in the present paper and in the study of the- transmission of information. To justify this definition wo will state a number of properties that follow from it. Those properties will not be provod here,* but are easily deduced from the definition. Properties of H * - 2 p^ log p^. 1. H = if and only if all the p.^ but one are zero, this one having the value unity. Thus only when we are certain of the outcome does H vanish. 2. For a given n, H is a maximum and equal to log n if and only if all the p, are equal (i.6. l/n) . This is also intuitively the most uncertain situation. 3. Suppose there are two events in question, with m possi- bilities for tho first and n for tho second. Lot p^^ be the probability of tho joint occurrence of i for tho first and j for the second. The uncertainty of the joint event ?•. is - . H " " I J Pi ^ l0g P iJ • • For given probabilities p^^ ■ Z p. . for the first and * It is intended to develop these results in coherent fashion in a forthcoming memorandum on the transmission of informa- tion. ' qj » S for the second, tho quantity H is maximized if ond only if the events are independent, i.e., p^. = Pi^j * This maximum value is the sum of the individual uncertainties H — H x * Hg » -^S pj log Pj^ - 2 log q j ♦ These facts can bo generalized to any number of .different events, > ^ % . Suppose there are two chance events A and B as in 3. not necessarily independent. We define the mean conditional uncertainty of B, knowing A as - ••• BT A (B) - 2 p{A) H A (B> where H A (B) is the uncertainly of B when A has a definite A value A. Thus ^(B) is the average uncertainty of B for all different events A, weighted according to their differ- ent probabilities of occurrence c The uncertainty of tho joint event is the sum of the uncertainty of the first and the mean conditional uncertainty of the second. In symbols H(A,B) - H(A) + H A (B) This is true whether or not thero are any casual connections or correlations between the two evonts. In the same situation the uncertainty of B is not greater than the joint uncertainty H{A,B), H(B) < H(A,B) The equality holds if and, only if every B (of prdbability /~; greater than zero) is consistont with -only one A. That - is, if A is uniquely determined by B. • From properties 3 and 4 wo have . .. r- .* H(A) + H(B) > H(A,B). H(B) > H(A,B) - H(A) = H(A) + H A (B) - H(A) H(B) > H,(B) 7. Thus tho uncertainty of B is not greater than its avoragc value when we know A. Additional information never in- creases average uncertainty. The equality holds if and only if A and B are independent. Suppose we have a set of probabilities p lf p g , p n# Any change toward equalization of these (supposing 'them unequal) increases H. Thus if p^ < p g and^wo Increase p^, decreasing pg an equal amount (to keep the sum 2 p^ con* stant at unity) so that p^ and p g aro more nearly equal, then H increases . More generally if v/e perform any rt aver- aging " operation on the pj,, of tho form ' ■pi 8. a permutation of tho p. with H of course samc^. 3 where 2 a^j * 1 and all a^ > 0, then H increases (except in tho special case where this transformation, amounts to no more than remaining the ... • H measures In a certain sense how much "information is ' generated" when the choice is made. Suppose such a chance event occurs and we wish to describe which of the n possi- ble events took place • The average amount of paper re- quired to write.it down in a properly chosen notation is in the cases of interest to us, about proportional to H. Thus there might be 10^0 «■ 1Q50 possible events, with 10 ■ 10"" 3 ^ and of them having a pr probability of ^ .1CT 50 . We could set up a notational sys- tem to describe which event occurs as follows* We number the events from 1 up to 10*^ + 10 50 and when one occurs - write down the corresponding number. The average amount of paper required will be proportional to the overage number of aigits we need. This will bo nearly 30 If the'li. /iy event Is in the first group of lO 30 , and about 50 If In the' " / *;/ second group. Thus the average number of digits, is about 40. We also have ,"• - V K* -10' * 40 30 | ip-ftf-iog ficT 50 - 10 9-. Although tho last result is only approximately true vtf the number of choices is finite it becomes exactly tri. when an unlimited sequence of choices is made. Thus 3 a sequence of N independent choices is made each choic being from n possibilities with probabilities p^, Pgi ••*» P n then the total amount of information genoratod is H ■ - N Z Pjl log pj ; If N is sufficiently large, the expected number of dif required to register tho particular choice made is arl trarily close to H, providing the. correspondence betwc - sequences of digits and sots of choices is correctly r . If incorrectly made it will be greater than H-. Moreo\ ./V if n is sufficiently largo tho probability of needing more than H digits is very small* - \ / . , 10* It can be shown that if wo requlro^oejrtiairi reasonable "properties of a measure o^choioot^H^ncertainty then formula - S.p^ log p A necessarily follows* These roqv properties and the proof of this statement are given i Appendix I t The chief property is that tho measure be a sense additive— if a choice be decomposed into a sei of choices the total choice is the sun (properly weigl of the individual 'choice*. . ^ II, Finally we note that quantities of the type 2 log j have appeared previously as measures of randomness, pr larly in statistical mechanics. Indeed the H in Boltr H theorem is defined in this way, being the probabi of a system being in cell i of its phase space. Most the entropy formulas contain terms of this type. ■ ■■■■■■■■ - ♦ , "-''-\ Tho base which is used in taking logarithms in the for amounts to a choice of the unit of measure. v If the base is we will call the resulting units "digits;" if the base is t the .units will be oallod Halternativps.^ i- One digit is nbou alternatives. A' choice from 1000 equally likely possibilit is 3 digits or about 10 alternatives. . , 2. Language as a Stochastic fepcess> 6 v • A natural language, such as English, can be studi from many points of view — lexicography, syntax* semantics, history, aesthetics, etc. The only properties of a languag of interest in cryptography are statistical properties. Wh are the frequencies of the various letters, of different di (pairs of -letters), trigrams, words, phrases, etc.? What i the probability that a given word occurs in a certain mossag The "cleaning" of a message has significance only in its in- fluence on those probabilities. For our purposes all other properties of language can be omitted. We consider a langur. therefore, to be a stochastic {i.e. a 'statistical) process w generates a sequence of symbols according to some system of probabilities. The symbols will be the letters of the langu together with punctuation, spaces, etc., if these occur. Conversely any stochastic process which produces a discrete sequence of 'symbols will be said to be a language. This will include such cases as: , , , 1. • Natural written languages such as English, German, Chine S % Continuous information sources that have been rendered discrete by some quantizing process,:. Tor example., the quantized speech from a PCM transmitter, or a quantized •television signal* * .. 3. "Artificial" languages," where we merely defiae abstract 1 a stochastic process which generates a sequence of symbc The following are examples of artificial languages. (A) Suppose wo have 5 letters A, B, C, D, E which are chosen each with probability .2, successive choicer being independent. This would lead to a sequence c which tho following is a typical example. B DCBCECCCADCBDDAAECEEA ABBDAEECACEE'BAEECBCEAD This was constructed with the use of a table of rar numbers,* •.:'<• (B) Using the same 5 letters lot the probabilities be .4, .1, .2, .2, .1 respectively,. with successive choices independent.- A typical "text" in this language is thoni . ' ;1^fC> ' ' ^ '.; ""' ' a A A C D C B D C E A A D A D A C E D A ' v . f ; J; 'v i A P CA BE D A D D CE;0 A AAA A D ■(C) A more complicated structure is obtained "if succesi letters are not chosen" independently but their prot bilities depend on preceding lottors. In the simpj * Kendall and Smith, "Tables of Random Sampling Numbers," Cambridge, 1939. - 18 - case of this type a choice depends only on the preceding letter and not on ones before that. The statistical structure can then be described by a set of transition probabilities p^j), the probabi" that letter i is followed by letter The indices i and j range over all the letters in the language A second equivalent vrny of specifying the structur is to give the digran probabilities p(i,j), the re! tive frequency of the digram 1 j in the language. The letter frequencies pTi), (the probability of letter i), tho transition probabilities p^j) and 1 digram probabilities p(i,j) are related by the foi: ing formulas,, , ~ "■• . ~. pfi) -3 p(j,,J) -2 p(j,i) ~ Z p(jWlj'- ' . :. t .J ,,, x y . j ■ 3 : ; : - P(i) %M J ^^^xl 2|J i p 1 (ji -|p(i) - p(i j) * i % As a specific example suppose there are three lettt A, B, C with the probability tables: PiU) A 3 B C A ,e .2 i B .5 •5 c ; ,5 .4 a A B P(i) 9 2? 16 £7 a 27 A 3 B A 4 IF i B 8 27 e 27' 1 ST 4 135" A typical text ^in, this language is the following. A B B ABA B A B. A B A B A B'B B ABB B B B A B k ;B A B A BAB B B A C A C A B B A 3 B B 3 A B B A> A C B B B A B A \. " The next increase in complexity would involve trigr frequencies but no more* The choice of a letter wc depend on the preceding two letters but not on the text before that point. A set of trigram frequonci 13- p(i,j,k) or equivalently a set of transition prob: bilities Pjj(k) would bo required. Continuing in this way one obtains successively more complicate; stochastic processes. In the general n-gram case a set of n-gram probabilities p(i^, ig, • i n ) or of transition probabilities p, , ^ 1 1 H> Vl is required to specify the statistical structure, (D) Stochastio processes can also be defined which prt duce a text consisting of a sequence of "words. " Suppose there are 5 letters A, B, C, D, E and 16 "words" in the language with associated probabilii ' .10 A .16 BEBE - .11 tJABED - 3 .04 DEB ' .04 ADEB • .04 BED . . .05 CEED , »15 DEED ' .05 ADEE • .02 BEEP - 3 .08 DAB ' V >• 01 EAB *: .OX BADD • .05 CA * .04 DAD" v ? i .05 EE ^ Suppose successive "words" are cndseii Independent: and are separated by a space. A typical message might be: DAB EE A BEBE DEED DEB ADEE ADEE EE DEB BEBE BEBE BEBE ADEE BED DEED DEED CEED ADEE A DEED DEED BEBI CASED BEBE BED DAB DEED ADEB If all the words are of finite length this process is equivalent to one of the preceding type, but t: description may be simpler in terms of the word structure and probabilities. We may al3o general: here and introduce transition probabilities betwee words, etc., ^ I, - • .>. " i These artificial languages are useful in construe simple problems and examples to illustrate various posslbil V£e can also approximate to a natural language by_ moans of c series of simple artificial languages* The aero order appr mation is obtained by choosing all letters with the seme pr bility and Independently. The first order approximation is obtained by choosing; successive letters independently but e letter having the same probability that, it does in the natu language,. .Thus in the first order approximation to English is chosen with probability .12 (its frequency in. normal Eng and W with probability .02^'but there is no influence betwe adjacent letters and no tendency to form the preferred digr such as.TH, .ED, etc. In the second order approximation dig structure is introduced. . 'After a letter is chosen, the nex one is chosen in accordance with the frequencies with which the various letters follow the first one. This requires a table of digram frequencies p^(jj, the frequency with which letter j follows letter i. In the third order approximatio: trigram structure is introduced. Each letter is chosen wit probabilities which depend on the preceding two letters. 3. The Series of Approximations to English To give a visual idea of how this series of proce; approaches a language, typical sequences in the approximate to English have been constructed and are given below* In a: cases wo have assumed a 27 symbol "alphabet t ho 26 letter; and a space. - " ,., 1. Zero order approximation {symbols independent and equ: probable);-'.-, * •'•^. / ,. ' ' '■, \. ." t XFCKL RXKHRJFF JUJ ZLPWCFWKErW FFJEYVKCQSGXYB QPAAMKBZAACIBZLHJQD • 2. First order approximation (symbols independent but wit frequencies of English text). y OCRO HXI RGWR NMIELWIS EU LL NBNESEBYA TH EEI ALHENHT. \ OOBTTVA NAH BRL 3. Second order approximation (digram structure as in En ( OK IE ANTSOUTINYS ARE T INC TORE ST BE S DEAMY ACHIN D ILCNASIVE TUCOOVSE AT TEASONARE FUSQ TlZIN ANDY TOBE SEACE CTISBE " 4. Third order approximation (trigram struoture as in Eng IN NO 1ST IAT WHEY CRATICT FROURE BIRS GROCID PON DEN OL OF DEHONSTURES OF THE REPTAGIN jIS REGOACTIONA OF CRE 5m 1st Order Word Approximation." Rather than continue wi . . • tetragram, n-gram structure, it is easier and bett to jump at th^a point to ..word units. Here words are chosen independently but with their appropriate fro que REPRESENTING AND SPEEDILY IS AN GOOD APT OR COME CAN DIFFERENT NATURAL HERE HE THE A IN CAME THE TO OF TO EXPERT GRAY COME TO FURNISHES THE LINE MESSAGE HAD BE THESE. - 6. End Order Word Approximation. The word transition probabilities are correct but no further structure is included, THE HEAD AND IN FRONTAL ATTACK ON AN ENGLISH WRITER THAT THE CHARACTER OF THIS POINT IS THEREFORE ANOTHER METHOD FOR THE LETTERS THAT THE TIME OF WHO EVER TOLL THE PROBLEM FOR AN UNEXPECTED The resemblance to ordinary English text increase quite noticeably at each of the above steps* Note that the samples have reasonably good structure out to about twice t range that is taken into account. in their construction* Th in (3) the statistical process Insures reasonable text for letter sequence, but four-letter sequences from the sample usually bo fitted Into -good sentences,. . In (6) sequences of or more words can easily be placed in sentences without unu or strained constructions > Tfio particular sequence of ten words "attack on att- English writer that .the charaoter of th Is not. at all unreasonably. *»^*** • '-- ^ ^ The first two samples were constructed by the use a book of random numbers in conjunction for (2) with a tabl of letter frequencies. This method might have been continu for (5), (4), and (5), since digram, trigram, and word freq tables ore available, but a simpler equivalent method was u To construct (3) for example ono opens a book at random and selects a letter at random on the page. This letter is re- corded* The book is then opened to another page and one re until this letter is encountered. The succeeding letter is then recorded. Turning to anothor page this second letter : searched for and the succeeding letter recorded, etc* A si: process was used for (4), (5), and (6). It would be lnterc if further approximations could bo constructed, but the lab involved becomes enormous at the next stage* • , The stochastic process 6 is already sufficiently c to English for many cryptographic purposes since most crypt- analysis is based on "local" structure of not more than two three words in length.' . ' ~ . - ■ . :; s ; • . 4*. Graphical Representation of a Markoff Process Stochastic processes of tho type described above r known mathematically as discrete Karkof f processes and have been extensively studied in the literature** $ho general ci ysi-: .'A * For a detailed treatment see M. Frochet, "Methods des fon arbitraires. Theorie des enSnements en chaine dans le ca: d'un nombro fini d'etats possibles." Paris, Gauthier-Vill 1938. ~ 16 - can be described as follows. There exist a finite number c possible "states" of a system; S 1 , S g , . .., S n » In additic there is a set of transition probabilities; q^j) the probe. bility that if the system is in state S ± it will next go tc state Sy To make this Markoff process into a language ger. tor we need only assume that a letter is produced for each transition from one state to another* The states will corr spond to the "residue of influence" from preceding letters. The situation can be represented graphically as s in Figs. 1, 2, 3 and 4. . The "states" are the junction poir. in the graph and the probabilities and letters produced for transition are given beside the corresponding line. Fig. 1 for the example B in Section 2, while Fig, 2 corresponds tc example C. In Fig. 1 there" ijs only ono stato since success letters ere independent* In Fig» 2 there are as many state as letters. If a trlgram example wero constructed there wc be at most n states corresponding to the possible pairs of letters preceding the one being choson. Figs. 3 and 4 shov: graphs for the case of word structure in example D. In the S corresponds to the "space" symbol. In Fig. 3 each word h a separate chain of branches from the left to the right juii point, while in Fig. 4 the branches have been combined, sic fying the graph. 5. Puro and Mixed Languages As we have indicated above a "language" for our p poses can be considered to bo generated by a Markoff proces Among the possible discrete Markoff processes there is a gr with special properties of significance in cryptographic wc This special class consists of the "ergodic" processes and shall call the corresponding languages "pure languages." A though a rigorous definition of an ergodic process is somev; involved, the general idea is simple. In an ergodic proces every sequence produced by the process is the same in stati. tical properties. Thus the letter frequencies > digram fre- quencies, etc.",- obtained from particular sequences will, as lengths of the sequences increases, approach definite limit, independent of the particular sequence. Actually this is n true of every sequence but the sot for which it is false ha; probability zoto. Roughly the ergodic property means, stati; tical homogeneity, - . « - • ••• • / - --iV-r , v (' - " . . . All the examples of artificial languages given ab are pure, the corresponding Markoff process being ergodic. This property is related to the structure of the correspond graph. If tho graph has two properties the language it gen will bo pure. These properties ore: 1. The graph cannot be divided into two parts A and B su that it is impossible to go from junction points in r. A to junction points in part B along lines of the gra in the direction of arrows and also impossible to go from nodes in part B to nodes in part A, 2. A olosed series of lines in the graph with all arrows on the lines pointing in the same orientation will be called a "circuit." The "length" of a circuit is the number of lines in it. Thus in Fig. 4 the series BEE is a circuit of length 4. The second property requir is that the greatest common divisor of the lengths of all circuits in /the graph be one, : \ - If the first condition is satisfied but the secon one ( violated by haying the greatest common divisor equal to d > 1, the sequences have a certain type of periodic struct The various sequences fall into d different classes which a: statistically the same apart from a shift of the origin (i.. which letter in the sequence is called letter 1) V» By a shi: of from up to d - 1 any sequence can be made statisticall equivalent to any other. A simple example with d = 2 is th- following. There are three possible letters a. b, c. Lettc a is followed with cither b or c with probabilities ± and £ 3 3* respectively. Either b or o is always followed by letter a Thus a typical, sequence is abncacacabacababacac. . This typo of situation is not of much importance for our woi If the first condition is violated the graph may 1 "separated" into a set of subgraphs each of which satisfies first condition. We will assume that the second condition 2 " also satisfied for. each subgraph. We have in this case what may be called a ''mixed" language made up of a number of pure components. . The components correspond to the various subgrc If **1» ^3* D:ce ^ ne component languages we may write > t - p^ ♦ p^ 2 * p 3 % ♦ *y->f\ where p A is the a priori probability of the component langut • ■ - j . Physically the situation represented is this. The are several different languages 1^, 1^, Lj, which are e of homogeneous statistical structure (i.o., they are pure languages). We do not know a priori which is to be used, bu once the sequence starts in a given pure component it cor. - 18 - indefinitely according to the statistical structure of that component. Wo do havo, however, a set of a priori probabilities for tho various components, p^, p g , . As an example one may take two of the artificial languages defined above and assume p^ = .2 and p 2 » .8. A sequence from tho mixed language L » .2 1^ + ,.8 Lg would be obtained by choosing first or Lg with probabilities .2 and .8 and aftor this choice generating a sequence from whichever was chosen* - A natural language, such as English or German, is not, of course, pure. Different kinds of text, literary, newspaper , technical or military, display consistently differ- ent types of structure. Those differences are small, however, in comparison with the differences -between different natural languages. If only local structure— letter, -digram and trigram frequencies, for instance — is of much importance, it is reason- able to consider "normal English" to be nearly pure. 6. Information Rate and Redundancy of a Language Suppose we have a pure language L produced by a given Markoff process. Associated with the language there are certain parameters which are of significance in questions of trans- forming the language and in cryptography. The most important of these is what we will call the "information rate" R for the language. It measures the rate at which the Markoff process "generates information," as determined by the measurement of the amount of choice available on tho average per letter of text that is produced. In Section 1 we deflnod the amount of choice when there ore various possibilities with probabilities Pl» P 2 i *V, P n as H ■ ■ 2 log Pi • In a Markoff process with a number of different ^states" there will be a choice value ft^ for each of these states and a proba- bility of being in each of the states (or a frequency with which this state occurs)* If this relative frequency for state i is P*, the average amount of choico Is R - Z Pi ^ summed over all the states. This is tho definition of the information rate for the language. If p^(j) is the probability of producing letter J when in state i we have ^ -2 Pi (j) log Pi (jJ the sun being over all tho letters in the language. Thus R - Z P t Pitj) log p t U) Tho infornation rate R has the units of alternatives (or digits) per letter sinoe it neasures the average amount of choice por letter of text that is produced, . A second parameter of importance is. the "maximum rate" R Q for the source. This is defined simply as the logarithm of the number of different letters in the language. R Q is also measured in alternatives or digits per letter. If * successive letters are chosen independently and each letter is equally likely R Q « R. Otherwise we have R < R Q . R and R Q are actually two limiting cases of informa- tion rates for the language. R may be said to be the rate when no statistical structure is taken into consideration and R is the rate when all the structure is taken into account. Between these there is an infinite series of rates R* f - R g , R Q , ••• which take some of the statistical structure into account. R^ takes the letter frequencies into account and is defined by % « L p(i) log p(i) .. - * where p(i) is the probability of letter i. R 2 takes digram structure into account and is def inod by R 2 r -2 p(I)'p 1 (J) log Pl (J) where the p(i) are letter probabilities and pjJJ) the ^transition probabilities, i»e., tho probability of letter i being followed by letter J; In general we define *n " Z P<*i» h* W Piifg V d (i n ) lOg P ± 4 * (i_) X \H *n-l n where tho sum is on all indices i, , • i_ and p< • •• . 1 ^ .'I 1 n-l is the probability of (n-1) gram i-^ •*» i^^ with p i ^n^ tho I^^abillty of this n-1) gram being folio; 1 n-1 by letter i^. ^ may be called tho n-gram information rate fc the language. It can be shown that . R o> R l> R 2 ^ R oo " R These rates determine how much a language /can be "compressed" in length by a suitable oncoding process* A language with maximum rate Rq and rate R can be transformed in such a way that a sequence of letters N letters long is transformed into a sequonco of letters only N* letters long where IV R A « N R (This is approximate and only exactly true in the'limit as N -+ oo .) Thus tho information is "compressed" in th6 ratio R This is the greatest compression ratio possible. It makes use of all the statistical structure of the language. If only n-gram structure is made use of, a compression ratio is the best possible. The compression obtained in this way is only a statistical gain. Some infrequent sequences are encoded into much longer sequences while the more probable ones go into shorter sequences so that on the average the length is de- creased. It is the type of compression obtained in telegraphy by using the shortest telegraph symbol, a single dot, for the most froquont letter E, while uncommon letters Q, Z, etc, arc encoded into longer telograph symbols. An average reduction in time of transmission is obtained but there are possible soquencos, e.g., Q Q Q * » t, which require much longer* _» ■ ■ • Performing 'a transformation on a language L which compresses as much as possiblo will be called reducing t to a "normal" form. When this has been done it can be shown that all letters in the output are equally likely and inde- pendent. Actually to realize this transformation would usuall 21 - r>nT TTT IHF1 TTXj "I require an infinitely complex machine, but we can always ap- proximate it as closely as desired, with a machine of finite complexity. Tho quantity D = R Q - R will bo called the redundancy rate of the language. It meas the excess information that is sent if sequences in the lang arc transmitted in their original form (without compression reduction to normal form). Correspondingly thero is a whole series of redundancy rates: D o - R o - V Dp - R, - R ? ej x m D = R - R n o n D = R c - R is the redundancy rats due to n-gram structure in the language . The redundancy D can also be said to measure the amount of statistical structure in the language. If the se- quence is purely random D = whilo at the other extreme if each letter is completely determined by preceding letters wit no freedom of choice, D has its maximum" possible value R Q . 3 is sometimos convenient to use the "relative" redundancy D/R c which must lie between and 10C#. • ; V If we hnvo a source of rate R, maximum rate R (bot in digits per letter) and consider the possible sequences of letters these fall into two groups for N large. One group ol "high probability" sequences contains about 10™ zz sequencGS (where we have assumed R measured in digits per letter). All of those have substantially the same logarithmic .probability. The remainder of the total of 10*°* possible sequences are of very small probability. In fact thoir total probability ap- proaches zero as N increases . The logarithm of the probability of an individual sequence in the high probability group is thus about -RN. In a procise statement of these results we must allow a certain fuzzincss in R, i.e., replace R by R ± e whore e -* as N -* oo « . Reduction of a language to normal form is performed by properly matching tho probabilities of sequences to the length of the corresponding sequences in the normal form. The "high probability" sequences are translated into short sequences and tho remainder into longer sequences. _ An example will clarify tho results we have given. Let the language contain 4 lotters A, B, C, D. In a soquenoe successive lotters are chosen independently, the four letters having probabilities ^, ^, |, £, respectively. Vie have r q m iog 2 4-2 alternatives/letter and 1 11 12 1 R l * R 2 " % " " R " " ( 2 log t + 4 loe 4 + 8 los 8" } ■ * I + I * I ** 4 alternatives/letter By a suitable transformation the average length of sequences can bo reduced by tho factor ^/2 - 7/8. A transformation to do it is the following. First wo translate into a sequence of binary digits (0 or 1 ) by the following table A B 10 - C 110 D 111 After this pairs of the binary digits aro translated into the • original alphabot as follows 00 ' A 1 01 B» 10 C» 11 D« - 23 - For a typical scquonco this works out as shown below: AB CABAC BBDAA D A D A 10 110 10 110 10 10 111 111 111 Regrouping and translation back into letters: 01 01 10 01 00. 11 01 01 01 11 00 11 10 11 10 . B« B» C« B» A» V B' B« B» D« A* D« C» D' C In this case there are 16 letters in the original and 15 in final text. Thus due to the snail redundancy and the short of the text only part of tho saving is; evident* . In a long hoivever the full reduotion -of g would appear* , This nay be verified directly in this cose. In a long text of N letter each letter will appear with about its. appropriate* *requenc Thus the nuriber of binary digits will be about N[| • l + J-2+|«3+^-3] ■ J N since each A gives one binary digit, each B gives two, etc. nuriber of letters in the final text is half this since each pair of binary digits goes into ono letter. Thus the re due is by a factor Z . It is also easy to seo in this case that the bina digits are equally likely and independent, and fron this th tho final text letters are also* This situation is nore coriplicated for nixed long and we shall not enter into it here* Wo nay note, however, that if L -jpfo* •'»•• ♦ P n Ifc : whore 1^ is pure with rate R^ f then the long sequences of fall into (n+1) groups^ The first n groups correspond to t: pure conpononts. Thpse in gr oup 1 nunber about - and have logarnithic probability about 24 - ^■'H M, || | | Tho last group contains all other sequences and has a snail total probability* 7, Redundancy Characteristic of a Language The form of the curve D(N) as a function of N na; called the redundancy characteristic of the language. In : rough way it describes the way in which the redundancy appt In Fig. 5 several types of characteristics are shown, all i the same final redundancy. The way in which this approach is of importance in cryptography. For languages which reac final redundancy at one or two letters (Curves 1 and 2) one of cipher (ideal ciphers) can be used. For those which rer near zero out to fairly large N (like Curve 5) another type appropriate. Natural languages are apt to show a character more like 3, and this makes them difficult to encipher witi security by simple means. ■ . - Examples ; 1. A language in which successive letters are independer but with different probabilities has a characteristic Type 1. 2. Consider a language constructed as follows. First sc 26 8 different sequences of letters, each 16 letters 1 from tho 26 16 possible sequences of this length. Th: should be a random selection. The 16-letter sequence chosen aro the "words" of tho language. Messages arc random sequences of those "words." Such a language 1 a characteristic like the Curve 5, 3. A language with digram structure only, such as Exampl in Section 2 above, has a characteristic of the Type Fig. 5, reaching its final value at N = 2. 4. English has the characteristic 3 in Fig. 5. ■ The redundancy characteristic describes how the structure in the language is spread out. If the structure localized, tho curve rises rapidly to its final value. If there are 'long range influences the asymptotic value is ap- proached more, slowly. If the structure is "locally random" the curve will romain near zoro for small N. 8. Secrecy Systems Before we can apply any mathematical analysis to secrecy systems, it is necessary to idealize the situation suitably, and to define in a mathematically acceptable way what v«e shall mean by a secrecy system. A "schematic" -diagram of a general secrecy system is shown in Fig. 6. At the trans- mitting end there are two information sources — a message source and e key source. The key source produces a particular key from among those which are possible in the system. This key is trans- mitted by some means, supposedly not intercept ible , e.g. by mes- senger, to the -receiving end. The message source produces a messnge (the "clear") which is enciphered, end the resulting cryptogram sent to the receiving end by a possibly interceptible means, for example radio. At the receiving end the cryptogram and key are combined in the decipherer to recover the message. Evidently the encipherer performs a functional opera- tion. If M is the message, K the key, and E the enciphered mes- sage, or cryptogrrm, we have I - f(M, K) i.e. E is r function of M end $« We prefer to think of this, however, not as n function of two variables but as n (one para- meter) family of operations or trcnsforma tions , and we write it E - T,M. . The transformation T, applied to message M produces cryptogram E. The index i corresponds to the particular key being used. If there are m possible keys there will be m transforations in the family T g , ...... T ffi , At the receiving end it must be possible to recover M , knowing E and X. Thus the transform tions in the family must have unique inverses M - Tf 1 E at any rate this inverse must exist uniquely for every E which can be obtained from an M with key i. The key souroe can be thought of as a "probability machine," something which chooses from the possible keys ac- cording 'to a system of probabilities. Mathematically then, the keys (or the parrmeter of the family of transformations) belong 26 - THiTijfjjiriirrTUT to q probability or measure spree. Hence we r-rrive rt the definition: A secrecy system is o family of uniquely reversible transformations T, of r message spree ^ into cryptogam spr.ce.Tl_,, the parameter i belonging to a probability spr.ee CL.. Conversely any set of entities of this type will be called a * "secrecy system." . . The system can be visualized mechanically as a machine with one or more controls on it- ' A sequence of letters, the message, is fed into the input of the machine and a second series emerges at the output. The particular setting of the controls corresponds to the particular key being used. Some method must be prescribed for choosing the key from all the possible ones* To make the problem mathematically tractable we shall assume that fthe enemy knows the system being used * That is, he knows the family of transformations T,, and the probabilities of choosing verious keys* One might object to this as being unrealistic, in that the cryptanalyst often does not know whet system was used or the probabilities of vrrious keys. There are two answers to this objection. 1. The resumption is rcturlly the one ordinarily used in cryptogr-phic studies. It is pessimistic and hence s-:fe, but in the long run realistic (particu- larly in military work), since one must expect his system to be found out eventually through espionage, captured equipment, prisoners, etc. Thus, even when an entirely new system is devised, so thot the enemy crnnot rssign rny a_ priori probability to it without discovering it himself, one must still live with the expectation of his eventual knowledge, • . 2. The restriction Is much weeker thrn appears at first, due to our broad definition of what constitutes the system. Suppose a cryptographer intercepts a message and does not know whether a substitution, transposi- tion, or Vigenere type cipher was used* He can con- sider this' as being enciphered by e system in which part of the key la the, specification of which of these types was used, the next part being the particular key for that type. These three different possibil- ities are assigned probabilities according to his best guesses of the a priori probrbilit ies of the en- cipherer using the respective types of cipher. - 27 - cwiui ' mum A second possible objection to our definition of secrecy systems is that no account is taken of the common practice of inserting nulls in a message and the use of mu tiple substitutes. Thus there is not a unique E ■ T, M, t actually the encipherer can choose at will among a number different E's for the same message and key. This -situatic could be handled, but would only add complexity at the pre stage, without altering any of the basic results. To defi the more general secrecy system, one would add a second pa meter to the transformations T,, which corresponds to the various choices of cryptograms corresponding to a given me sage and key. It is possible, but not always desirable, t consider this second parameter as part of the key, since i does not need to be transmitted to the receiving point. We elsO assume that the enemy is in possession o measure in the space M , the a priori probabilities of var messages. The same object ion"~and essentially tho same ans might be given to this assumption as to his knowledge of t transformations T*. This measure, however, we do not cons rs part of the secrecy system for reasons which wITl apper later. The secrecy system whose transformations are T. wi be denoted by T and this concept includes the space or. which T operates (without its measure ), the trans formation r-nd the spaces Ojr and "i^,, the former with its probabili measure. If the messages are produced by ? M-rkoff proce? of the type described previously, the probabilities of vrx messages are determined by the structure of the M^rkoff pr For the present, however, we wish to t^ike a more general t of the situation rnd regard the messages as merely an abst set of entities with associated^. probabilities , not necess' composed of a sequence of letters and not necessarily prod by a M^rkoff process. It should, be emphasized that throughout tne pape secrecy system means not one but a set of many transformat After the key is chosen only one of these transformations used and we might be led to define a secrecy system as a s transformation on a language.* The enemy, however, does r. know what key was chosen and the "might have been" keys ar important for him as the actual one* Indeed it is only tfc exi stance of these other possibilities that gives the syst *A. A* Albert in a paper presented at a Manhattan, Kansas, meeting of the American Mathematical Society (Nov. 22, If • entitled "Some Mathematical Aspeots of Cryptography has defined a ciphering system in this way. With this limite definition about all one can do is to describe and class; from the mathematical point of view various types of trar formntions. 28 - any secrecy.' Since the secrecy is our primary interest, are forced to this rather elaborate concept of a secrecy system. This type of situation where possibilities are t important as actualities is almost the rule in games of strategy. The course of a chess game is largely control! by threats which are not carried out. See also the "vir: existence" of unrealized imputations "in von Neumann's the of games. There are a number of difficult epistemologica 1 questions connected with the theory of secrecy, or in fac with any theory which involves questions of probability (particularly a priori probabilities. Bayes* theorem, etc when applied to a physical situation. Treated abstractly probability theory can be put on a rigorous logical basis with the modern measure theory approach** As applied to reality, however, especially when "subjective* probabilit and unrepec table experiments are concerned, there are mar. questions of logical validity. For example in the appror to secrecy made here, a priori probabilities of various k are assumed known by tEe enemy cryptographer — bow can one determine operationally if his estimates are correct, on basis of his knowledge of the situation? It may happen thrt the keys are chosen by the cipherer according to one system of probabilities, i.e. c measure in the key space 0„ nnd that the enemy cryptanaly estimates a second different system of probabilities fl£ i this space which ere entirely reasonable in the light e his knowledge of the situation — which is correct? I be lieve that both a.re correct.' The calculation besed on Clj, leads to the solution when the enemy knows just how the keys pre chosen r nd the solution .based on ^ leads to sol tions which are correct for a situation agreeing with the enemy's knowledge of the actual situation. It rppears in tuitively that the enemy's lock of knowledge can only do him harm, and probably this can be proved, but this quest has not been investigated* In fact, we assume only one measure ^ in the key spaoe* Similar remarks may be made regarding measure in the messrge space Ow. *See J» L. Doob, "Probability as Measure," Annals of Math Stat .\ v, 12, 194J., pp.*206-2U. A.. Kolmogoroff , "Grundbegrif fe der W^hrscheinlichkeits Rechnung," Ergebn'isse der Mr.thenetic, v,2, No* 3 (Berlin 1933). - - 29 \QlifT"rnrTTTTrr Actually In practical situations, only extrec errors in P priori probabilities of keys and messages cau much error""in the important parameters. This is because the exponential behavior of the number of messages, etc, and the logarithmic measures employed. With regard to the application of the m^ theme theory of probability to physical situations there are tv. main theories or ways of setting up the correspondence. The frequency theory- .Probability is correlated with re frequency of an event* .This Is the correspondence used t the practicing statistician, in principle by the physic is etc. (2) The degree of belief approach. .Probability is a subjective phenomena and measures one's degree of belief the occurrence of on event* .This approach is seen often the work, of historians, Judges, and in everyday life. Al though this latter approaoh has of ten been attacked as me less we cannot agree with this opinion. In the first pie the intuitive approach can be given a rigorous mothematic f«tuv4stion» . This has been done in * very elegont way by B. 0. Koopmen.* Essentidly one need only assume that a be capable of making probability judgments (Event A is m: less probable than event B or they are equiprobable) and his judgments be self consistent (e.g. if he judges A mor probable than B end B more probable than C he should jud£ more probable than C). One can even establish numerical by the use of a "standard gauge," for example a roulette v, and thus relnte the subjective and the frequency probabil In the second place, on progmatlc grounds one can hardly the subjective applications , since almost all of our ever decisions are based on this sort of probability judgment. Cryptographic work involves both types of applications, the use of frequency tables, significance tests etc., the crypt-nalyct is following the frequency approach. In th "intuitive" methods of cryptanalysis (probable words etc degree of belief approach is more- in evidence* » We may remark that e single operation on a language which is reversible forms a degenerate type of e system under our definition— a system with only one key r unit probability- Such a system has no secrecy — the cryi analyst finds the message by epplying the inverse of this transformation, the only one in the system, - to the interc cryptogram* The decipherer and. cryptanaiyst in this case *B. 0. Koopman, "The Axioms and Algebra of Intuitive Probability," Annals of Mathematics, v. 41, no. 2, 1940, p. 269. "Intuitive Probabilities and Sequences," v. 42, no.l,. 1941, p. 169. - 30 fiflP r I IT I l possess the ssme inf ormation. In gonerr.l, the only differ between the decipherers knowledge on3 the enemy cryptanal knowledge is that the decipherer knows the pnrticul^r key used, while the cryptanalyst only knows the b priori pr->bc ities of the various keys in the set. The process of deci ing is that of applying the inverse of the particular tr o r. formation used in enciphering to the cryptogram. The proc of cryptenalysis is that of Attempting to determine the me (or the particular key) given only the cryptogram find the a priori probabilities of various keys and messages * A system will be celled fc^oaed" if any possible cryptogram can be deciphered with any possible key. This that the inverse transformations T~l are ell defined for e element in the cryptogram -spaoe. 1 7/e shPll use the notation |m| for the "size" of message space: ; ../ X* • ImI- *•£ P(M) log P(M) where P(M) is the probability of message M end the sum is all messages of just N letters. Thus \U\ is a function of and measures the amount of "choice" in the selection of an letter message. F or large N, |M| is approximately RN. Similarly Ik] is the size of the key space IkI - - 2 P(K) log P(K) the sum being oyer all keys. 9. Representation of Systems ^ A secreoy system can be represented in various One which is convenient for illustrative purposes is a lin diagram, as in. Figs. 7, 10, 11. The possible messages are represented by points at the left end the possible cryptog: by joints at the right. If;a certain key, say key 1, tran forms messnge M g into cryptogram E . then M« and E. are con- nected by a line ilabeled l f etc» From eacn possible messn there must be exactly one line emerging for epch different t A- second representation is by means of a rectant array. This may be done in three different ways* For the closed system of. Fig. 7, the three arrays are as follows: - 31 - M 3 Ma V K m\. 1 E l E 4 E 2 E 3 E l E 4 E 4 E 3 E 1 E 2 E 2 E 3 ^1 M. M 4 E» Eo. E 2 3 4 . K 1 2 3 1,2 1 2 3 2 3 1 E \ 1 2 o El M l % E 2 M 4 M 4 E 3 M fi K 4 E 4 id3 % transforms % Into E-z and either ?^£Vj t0 E § by key 3 * No From the third E 3 is^e^ipherel hi kL Vf^H M 4 ^to S a . arrays and the l?ne diagram contain !Lf *? g f V f M 3' A1 * of S these any one the others can be derived, equivaleGt informs tion-from , ' * . . • > • ^ • _ . • *• . transform^^in^ describe the set of ^ bilities of various ke?s mS; ai« £ pec } fy tlle system the proba- by merely listing the kevHftS be eiv f n ' This m W ^ done Similarly the melsagl SSbl 1? not Probabilities" the probabilities of the va^^^S •^.SSJ* 1 * ^ the set oAZsfor^oL 8 W \ e ? 18 t0 desc1 ^ forms .on the message for an LhUl^ 8t °P er,2 tions one per- grsm. Similarly one d??iJes f X 6 L to ybtr - in the crypto- various keys by describing how Tklv £ Probabilities ?™ . of the enemy's habits of kJv- ilh««f 7 ^ ohosen, or what we know messages are Implicit detL^ The Probabilities tor knowledge of tha e^mvL ? ined by stating our a priori tion (wflch will Since ^r^nh^^ 3 ' th * ^otToaTSfluB, " and any special inSiVwl fi^Es . ,«ajr uave regarding the cryptogram. 10. Notation M K E V The following notetioa „m generally be followed, the encipher&d message or cryDtourr m t%Zll&&\Tct nls -S^SSW probabilUlee, . ^ SbXi^W* ProbaMlitles. also 4 3 » the cryptogram space, also a probability space, sine- the probabilities in 3L, and induce probabilities CL/.for each cryptogram, th m, ■ the i letter of the message e^ * the i'tti letter of the cryptogram k^ « the i tn letter of the key when it can be so describe Generally P stands for a probability- Conditional probabilities are indicated with subscripts; Thus P(M.) " probability, of message M P(E) ■ probability of cryptogram E P(K ) <■ probebility of key K . • P M (E) - conditional probability of ,E if message M is chos Eg(M) :'.» conditional probability of if cryptogram E is intercepted,- i*e# the a posteriori probability of • if E Is observed* " " O' , * ■ ■ Q * equivocation, a concept to be defined precisely It which measures the uncertainty of some ~ knowledge c fined only by probabilities. We also hr>ve condit equivocations, thus Q^(K) is the equivocation of ■ key knowing the message. |k| « - L P(K) log P(K) the size of the key space \n\ •» - E P(il) log P(M) the size of the message space [e| • - E P(E) log P(E) the size of the cryptogram space m * number of different keys N * number of intercepted letters R Q » mr-ximum information rate for a language R « mean rate JX * R - R ■ redundancy of a language T, R, S, etc. ■ secrecy systems T*, R»« S,, etc* » particular transformations of these systems 11 * Some Examples -of Secrecy Systems In this section. a number of' examples of ciphers ^ be given* These will' often be referred to in the remeinde: the paper for illustrative purposes* " ; * ' '. " ' ■ 1. Simple Substitution Cipher. '■ \ -,. In this cipher each letter of the message is repl by a fixed substitute, usually Elso a letter.' Thus the me: M *. m^ nig m^ m 4 » . . * 33 * be cranes e l e 2 3 4 K*S^S«« x'u ?he IbstttuiV AT is the substitut for B., etc* " • v . , • .. . » 2, Transposition {Fixed Period dV • - V The nessr.ee is divided into groups of length d-.nd a the second group, etc \ r !?* P *??£ first d integers- Thus fc that m x m 2 m 3 m 4 a g m 6 nig m 10 oeco ^ ^ m 5 n 4 m ? ^ * 6 ^ m g ... 4 Sequential npplic* tion of two or mor, transpositions will be c.Ued compound imposition. If the periods are *1^V 1 Stow d i.< thrt the result is a transposition of perioa a, the least comon multiple of d g , d 3 , V v 3. Vigenere, rnd. Variations* ■ In this cipher the key consists of a series of d A « to Z - 25). Thus e^, » <* fc^ i mod 26} J where k« is of period d in ithe Index U \f For example with the key G A H we obtain message N W I S T H E <* , - . repeated key G A H G AH G A # * * cryptogram _ T D. SANE-*** The Vigenere of period \}« •^^"5" xs'alvonced a' »em^^ may be any number from to 25. The so oexxe* o - 34 - V-ri^nt Beaufort r,re simil r r to the Vigenere, end encipher by the equations e l * k i - (mod 26) e i * m i " k i ^ mod 26 ^ respectively. The Be°,ufort of period one is called the reversed Caeser cipher. . The application of two or more Yigenfires in sequence will be called the oompound Vigenere. ' It has the equation ... * j , e i * m i + k l * *i **** * *i ( mod ' . • • . . . > - ■'«- . .... , , - v .,,.. :- • • where 1^, *.., in general have different periods P • • •' ' "'>'•■ •' ■ ■■ '■ . n&; '/ • • ■ The period of their sum • « < . * * * « k i + *i + * s i as in compound transposition, is the least common multiple of the individual periods. 4. Vernam System** When the Vigenere is used with an unlimited key, never reperting, we h°ve the Vernam system, with e i * m i * k i ^ mod the k, being chosen at random and' independently among 0, 1, 25. If the key is a meaningful text we have the "running key" cipher. . • ' 5. Bazeries Cylinder. . ,>.'■-■- •• ■„ ; • 'j • • » -v ' ,..«•■< In this mechanical system 25 thick disks are used, - each having a mixed alphabet stamped around the edge. These disks can be arranged in any order on.a spindle,' and the par- ticular arrangement used constitutes the key.' With the disks in their proper order; a message, is- enciphered by turning the disks so that the message appears* on a,. line -.parallel to the axis of the spindle* Any. other line of letters may then be chosen for the cryptogram. 'To decipher^ the cryptogram is arrenged on a line end- the decipherer looks for another line which then makes sense. — *G. S. Vernam, "Cipher Printing Telegraph Systems for Secret Wire' and Radio Telegraphic Communications.'' Journal Ameri. Inst, of Elect. Eng., Vj ,'XLVy p#, ! 109-115, 1926. 6, Digram, Trigram, rnd N-gram substitution. Rather than substitute for letters one cnn substi for digrams, trigr^ms, etc. Genercl digram substitution i quires n key consisting of a permutation of the 26 2 digrar It can be represented by a table in which the row correspc to the first letter of the digram and the column to the se letter, entries in the table being the substitutes (usuall also digrams)* 7* Interrupted Key Vigenere. , The Vigenere and its variations can be used with interrupted key* • The sequence of key letters is -started e at irregularly spaced points* 7 Thus^ if the entire key sec isXPGH* TRS> one can Interrupt irregularly to get X .P OH F TI H X P Gfi ? lE'XPlPO » • • The points of interruption can be determined in various wt (1). Whenever a certain letter occurs in the clear »• (£). Whenever a certain letter occurs in the cryptogram. (3.) / interrupting letter, say J, can be reserved as a signal ar the encipherer Interrupts the key at his discretion, (4). signal is used end the decipherer loontes the interruption by the appearance of meaningless text in the decipherment, In place of starting the key again at ecoh. interruption or can omit letters of it or reverse the direction of progrer There ere many variations and combinations of these methoc 8. Single Mixed Alphabet Vigenere. This is a simple substitution followed by a Vigenere* e^ » f (n^) + kj • ■ The "inverse" of this system is a/Vigenere followed by sir substitution' e . ■» g(m 4 * k«) .1, i i . m i r e" 1 (e i } - k i , ■ / 9- Vigenere with Progressing Key* • The period of >> Vigenere ean be expanded by ndding n fixed number t to the key pt e^.ch pppefrance — thus the n^h group is enciphered by the equ-.tion e i * m i + k i + nt Also this can be vnried by adding t and s alternately to the key, etc. 10. Matrix System** * One method of n gram substitution is to operate on successive n-grams with a matrix having an inverse* The letters are assumed numbered^ from to 85, making, them elements of an algebraic ring. From the n-gram m, ou r»* m of message, the matrix a^j gives an n-gram of cryptogram < . ' n e, • Z a u a, i » 1, *t»,n 1 j=l 1J J The matrix is the key, and deciphering is performed with the inverse matrix. The inverse matrix will exist if and only if the determinant la^. | has an inverse element in the ring. 11. The Playfair Cipher. This is a particular typp of digram substitution governed by a mixed 25 letter alphabet written in a 5 x 5 square. (The letter J is often dropped in cryptogrephic work- it is very infrequent, and when it occurs can be replaced by I.) Suppose the iey square is as shown below LZQCP A N U RDMIf '? K Y.S T S ' X B T E W - "•' — - ■ * - ' *See L. S» Hill, "Cryptography in an Algebreic Alphabet, 1 * American Math. Monthly, v. 36, No,. 6 t 1, 1929, pp. 306-312,* Also "Concerning Certain Linear Transformation Apparatus of ^ Cryptography," v* 38, No. 3, 1931, pp. 135-154,. - 3-i - The substitute for a digram AC, for example, is the pair c letters at the other corners of the rectangle defined by A and C, i.e. LO, the L taken first since it is above A. II digram letters nre on c . horizontal line as RI, one uses th letters to their right DF; RF becomes DR. If the letters on a vertical line, the letters below then are used. Thus becomes UW. If the letters are the same nulls nay be used separate them or one may be omitted, etc. 12. Multiple Mixed Alphabet Substitution. In this cipher there are a set of d simple subst tions which are used in sequence. If the period d is four m l <m 2 *i ffl 4 m 5 a 6 ,,f . ■• ' becomes h [m l ] f 2 {m 2 } f 3 (cl 3 ) f 4 (m 4 ) *1 1b 5* f 2 (m 6 } ... 13. Autokey Cipher. A Vigenere type system in vihich either the messr itself or the resulting cryptogram is used for the "key" i crlled an eutokey cipher. The encipherment is started wit a "priming key" (which is the entire key in our sense) and continued with the message or cryptogram displaced by the length of the prir4ng key as indicated below with the prin key COMET, The message used as "key", MESSAGE . S E N D S U P L I E S ... KEY -- — - COME 3.8 RiJD S UP CRYPTOGRAM USZHLMTCOAYH The Cryptogram us"ed as "key"* ' ; MESSAGE SENDS UP'P LI E S ♦*"#."' KEY . ' t O M E t U S 2 B t H »». CRYPTOGRAM u U3ZHL0 H*e"S TS - 38 - 14. Fractional Ciphers* In these, each letter is first enciphered into two or more letters or numbers and these symbols are somehow mixed (e.g. by transposition). The result may then be retranslated into the original alphabet. Thus using a mixed 25 letter alphabet for the key we may translate letters into two digit quinary numbers by the table 12 3 4 . . L Z Q, C P 1 AG NO V 2 R D M I F 3 K Y H V S 4 X B TEW , .- Thus B becomes 41. After the resulting series of numbers is transposed in some way they are taken in pairs and translated back into letters. 15# Codes. In' codes words (or sometimes syllables) are replaced by substitute letter groups. Sometimes a cipher of one kind or another is applied to the result. * 12 ^ Valuations of Secrecy Systems There are a number of different criteria that should be applied in estimating the value of a proposed secrecy system The more important of these are: ' 1. Amount of Secrecy. ' There are some systems that are -perfect — the 'enemy ls-no better off after intercepting any amount of material than before* • Other systems, although giving him some information, do not yield a unique "solution" to intercepted oryptograms* , - Among the uniquely solvable systems, there are wide variations in toe amount of labor required to effect this solution; end * the amount , of material that must, be intercepted to. make the solution unique, - - 39- - mJH*H^B£RTE$L 2. Size of Key.. The key must be transmitted by non-interceptible means from transmitting to receiving ends. Sometimes it must be memorized. It is desirable then to have the key as small as possible. 3. Complexity of Enciphering, and Deciphering Operations. These should, of course, be as simple as possible. If they are done manually, complexity lends to loss of time, errors, etc. - If done mechanically,, complexity, leads to large expensive machines. " " v 4. ; Propagation of Errors. In certain types of secrecy systems an error of one letter in enciphering or transmission leads to a large amount of error , In the deciphered text* The errors are spread out by the deciphering operation, c fusing the loss of much information and frequent need for repetition of the cryptogram. It is naturally desirable to minimize this error expansion.. 5. Expansion of Message.. In some types of secrecy systems the size of the message is increased by the enciphering process. This undesir- able effect may be seen in systems where one attempts to swamp out message statistics by the eddition of many nulls, or where multiple substitutes are used. It also occurs in many "conceal- ment" types of systems (which are not usually secrecy systems in the sense of our definition). 15. Equ ivalence Clesses In the Key Space It may happen that in a ciphering system two or nnre different keys, say keys 1,. 2, and 7, are equivalent. -By this we meen that for every M ~ J ■> ■C^ m "- i - . ■ - , . • , ' ••' •. ; - > ■ — V ' ■ . , ' ' ' . , " . ■ Av . ■ ^ ' "■ These keys will not be considered as distinct but will be thrown into an equivalence class*. It is >clear that the cryptanalyst oan never determine whioh particular one of these was used but " only {at test) the class.. The probability for the class is of course the sam of the probabilities of the different keys in ' : the class. - As an exemple, in- the Playfair cipher with the s; given above, the following are equivalent key squares. GHXPY X C I 2 T Z F E C.I JB'Dl.O LONRD V S <} T A T A V S Q t W B MK U K U W B M IP Y GH We can think of the possible equivalence classes in this c as arrangements of a 25 letter alphabet on a 5 x 5 square on an oriented torus. The number of different .keys is not but 251/5 2 - 241 • . " When vie say that two seorecy systems are the sam mean that they consist of the same set of transformations with the same message and cryptogram space (range and dome and the same probabilities for the different keys (after e identical transformations are put in .the same equivalence class). 14. The Algebra of Secrecy Systems If we have two secrecy systems T and R we cen of combine them in various ways to form a new secrecy system If T end R heve the same domain (message space) we may for kind of "weighted sum," S ■ p *T ♦ q where p * q - 1. This operation consists of first making preliminary choice with probabilities p and q determining whioh of T end R is used. This cholse is part of the key After this is determined T or R is used ns originally defi The total key of S must specify which of T and R is used e which key of T. (or R) is used* v ■ , If T consists of the transformations T^. t 1 with probabilities p v , P m end R consists o=f R, f ... R v with probabilities q,„ q k then S « p T * q R cons of the transformations Tp, T^ "•— , T , R r , R fc wit^ probabilities pp,., pp g , • PP a , qq x » Sfagi • qq k respectively* - 41 - More generally we c^n form the sum of a number systems. S = P 1 T+p 2 R+... + p m U Sp 1 - 1 We note that any system T can be written as a sum of fixed operations T " p l T l + p 2 T S + + p m T m Tj being a definite enciphering operation of T correspond!: key choice i, which has probability p f « A second way of combining two secrecy systems is taking the "product", shown schematically in Fig. 8. Supp r T and R are two systems and the domain (language space) of can be identified with the range (cryptogram space) of R. we can apply first R to our language and then T to the resi of this enciphering process. This gives a resultant operat which we write as a product ' S - T R The key for S consists of both keys of T and R which are as ohosen aocording to their original probabilities and indepe ly. Thus if the m keys of T are chosen with probabilities p l p 2 p m and the n keys of K have probabilities p l p 2 p n then S has mn keys (at most; there may and often will be equivalence classes) with probabilities- p. pl. This type c product encipherment is often used; for J example one follows a substitution by a transposition or a transpositic by a Vigen£re, or applies a code to the text and enoiphers jte*, result by substitution, transposition, fractionation, etc» k \ - A more special type of product may be defined in case both T and R have keys of the 3cme size which may be f rw in one-to-one correspondence with the same probabilities fc corresponding keys. This may be called the "inner product, in oontrast with the above which may be more completely de- scribed as an "outer product" (these names are derived froir. a rough analogy with the concepts of tensor analysis). In the inner product, written '\ S m T °R ■ - 42 - Q&ffSBEMTtcT r.nd indicated scheme tically in Fig. 9, the same key (or corr- spending keys) are used for both T end R chosen with the com probability* For exr-nple one nay construct e transposition cip: whose key is a permutation of the alphabet, each permutation being equally likely, and apply first this and then a substi" tion based on the same permutation. One also sees this situ: tion in certain geometrical types of transposition ciphers where the text is written into a square and a permutation ba. on a key word applied first to the columns and then the r of the square, * It may be noted that multiplication (either kind) not in general commutative, (we do not always have BS"SB although In special cases such as substitution and transposi* it is. Since it represents an operation it is def initionall; associative. That is R(ST) - (RS) T * RST,. Furthermore we ! the laws \ ' ' , ' p (p» T+ q' R) + qS * p p' T + p q T R + q S (weighted associative law for addition) T(pR+qS)«pTR+qTS ( P R+qS)T-pRT+qST (right and left hand distributive laws) and Pl T + p 2 T + ? 3 R - (p x + P 2 ) T + P 3 R Finally with regard to this algebraic structure of secrecy operations, we note that every closed secrecy system has an "inverse" T 1 obtained by Interchanging the E end M spaces, with key probabilities the s*me, and \T R S)» - S* R» T* (p T + q R)* - P V ♦ q K* % - , ' ...<_ Note that T T' is not in generel the -identity (this is the reason we do not write T**+)» . -< ■■■ y.t: I . . - . . - A system whose M and E spaces can be identified, a very common oase as when letter sequences are transformed into letter sequences, may be termed endomorphic* An endo- morphic system T may be raised to a power T n » - 43 - A secrecy system T whose outer product with itsel: is equal to T, i.e. for which T T ■ T will be called idempotent. For example simple substitution transposition of period p, Vigenere of period p (all with e key equally likely) are idempotent. The set of all endomorphic secrecy systems deflnec a fixed message space constitute an "algebraic vrriety," th is, a kind of algebra, using the operations of addition and multiplication. In fact, the properties of addition and mu plication which we have discussed lead to the following res Theorem 1: The set of endomorphic oiphers with the same message space and the two combining operations of weighted addition and ouster multiplication from a linear associative algebra with- a unit element, apart from the fact that the coefficients in a weighted addition must be non-negative and sum to" unity* It should be emphasized that these combining oper tions of addition and multiplication apply to secrecy syste: as a whole. The product of two systems TR should not be co fused with the product of the transformations in the system TjR,, which also appears often in this work. The former T is a** secrecy system, i.e. a set of transformations with as- sociated probabilities; the latter is a particular trans- formation. • Further the sum of two systems p R + q T is a system — the sum of two transformations is not defined. The systems T and R may commute without the individual T, and R, commuting, e.g. if R is a Beaufort system of a given perio all keys equally likely, R i R 3 * R J R i' in general, but of course RR does not depend on its order; actually ^ • - ' -RR > v -vv-r ' ■■ • the Vigenere of, the same period with random key* On the oti hand, if the individual T. and E, of two systems T and R commute, then the systems commute** " \~ \ - . i.. .. • > ■ . . • •• - It is rather surprising to find an algebraic varir with as much structure as a linear associative algebra in w> ■ - 44 - •the elements have the complexity of ciphers. In Hilbert space theory, for example, one has a linear associative algebra, but the elements of the algebra are transformations. Here the elements are sets of transformations with a probability space associated ■ ith the transformation parameter. These combining operations give us ways of con- structing many new types of secrecy systems from certain ones, such as the examples given. We may also use them to describe the situation facing a cryptanalyst when •attempting to solve a oryptogram of unknown type. He is, in fact, solving a secrecy system of. the type T P x A + p g B * . . . . + P r S + p* X Z p m 1 where the &f.B»>*t*i s are known types of ciphers, with the p« their a priori probabilities in this situation, and. p f X corresponds to the possibility of a completely new unknown type of cipher* ' In weighted r.ddition the key size of the result is given by = p IK.J + q |K 2 I - (p log p + q log q) = p Ik-J + q Ik 2 | ♦ |k 3 I i.e. the weighted mean of the two keys plus the size of the . p, q key* This is only in case there are no equivalences; if there are it will always be less. For the outer product the key size is Ik II 1^ I ♦ |k 2 I ■• with -equality only when there are no equivalences. In the inner product Ik! < |k x ! - Ik 2 I with equality under the same condition. 45 - 15. Pure and Mixed Ciphers Certain types of ciphers, such as the simple sub stitution, the transposition of a given period, the Vigene of o given period, the mixed alphabet Vigenere, etc (all with each key equally likely) have a certain homogeniety v, respect to key* Whatever the key, the enciphering, deciph ing and decrypting processes are essentially the same. Thi may be contrasted with the cipher PSMT where S is a simple' substitution and T a transposition of given period. In this case the entire system changes for enciphering, deciphering and decryptment, depending on whe the substitution or transposition was used* The cause of the homogeniety %a certain ciphers stems from the ^roup property — we. not! oe ' that in the above amples of homogeneous ciphers the product of any two trans formations in the set T, T, is equal to a third transforme T,. in the set, while T 1 ^ 1 J does not equal any transformat iB the cipher f p S + q T which contains only substitutions and transpositions, no products. We might define a "pure" oipher, then, as one wfc T* formed a group. This, however, would be too restricti-v since it requires that the E space be the same as the M si i.e. that the system be end amorphic. The fractional trans position is as homogeneous as the ordinary transposition v- out being endomorphic. The proper definition is the folic A cipher T is pure if for every Tj, Ty T k there is a T g s that T i V 1 T k - V . and every key is equally .likely. ' Otherwise the cipher Is The systems of Fig. 7 are mixed. Fig- 10 is pure if all k are equally likely. r «♦'• - r --- . „i Theorem 2: In a pure cipher the operations T. T, which transform the message space into itselT form group whose order is m, the number of differen keys. For Y 1 \ V 1 t j " 1 so that e*iCh element has «n inverse, also the assoeiativ law is true since these are operations, end the group property follows from using our assumption that T, -1 T,' - T . • T- for some s. The operation T^-^T^ means, of course, enciph the message with key j and then 'deciphering with key i w brings us back to the message- spa'oe* , If T is endomorphi- i.e. the T, themselves transform the space M into itsel: is the case with most ciphers, where both the message sp and the cryptogram space- consist of sequehoes of letters and the T^' are a group and equally likely, then T is purt since ■ T i Y T k • T i T r " T s • Theorem 3: The outer product of two pure c,iphers which c mute is pure. For if T end R commute ^ R^ - R^ T m for every i, j with suitable £, m, and . . ■ . - The commutation condition is not necessery, however, for product to be a pure cipher* ' A system with only one key* a single defini operation T^, is pure, since the only 'choice of Indices is T l T l" 1 T l * T l* Thus the expansion of a general cipher into a sum of such simple transformations also '.exhibits it as ft sum of pure ciphers. An examination of the example of a pure cipher shown in Fig. 5 discloses certain properties. The message fall into certein subsets which we will cell residue clas; and the possible cryptograms are divided into correspond!: residue classes. There is at least one line from er.ch mes sage in a class to each cryptogram in the corresponding cl and no line between classes which do not correspond. The number of messages in a class is a divisor of the total number of keys. The number of lines "in parallel" from a message M to a cryptogram in the corresponding class is ec to the number of keys divided by the number of messages ir the class containing the message (or cryptogram)* It is s in the appendix th?t these hold in generel for pure cipher Summarized in a more formal statement we neve / Theorem 4: In a pure system the messages can be divided i a. set of "residue classes" C., C 2 , C„ and the cryptograms into a corresponding set of residue classes C' C' . .., C' with the folic properties The message residue classes are mutually exclusive end collectively contain all possible messages.. Similarly for the cryptogrc-.ni residue classes. Enciphering *ny message in C, with any ke produces a cryptogram in CI. Decipherir. any cryptogram in C! with any key leads to a message in C^ t The number of messages in C. , say <p. , is equal to the number of cryptograms in C£ and is a 'divisor of k the number of keys. Each mrssnge in can be enciphered into erch cryptogram in Ci by exactly. JL different keys. Conversely qp. . for decipherment. 4 (1) (2) (3) (4) - 48 The importance of the concept of a pure cipher the reason for the nane) lies in the fact that for them & keys are essentially the same. Whatever key is used for & particulsr message, the a posteriori probabilities of a messages are identical* To see this, note that two diffe keys applied to the same message lead to two cryp-tcgrams the same residue class, say Cj » The two cryptograms ther fore could each be deciphered by — keys into each mes.< 9i in C. and into no other possible messages. All keys be in, equally likely the a posteriori probabilities of various messages are thus p b im) - hp a&ai _mi E P{M) P M {E) " where M is in C,, E is in CI and the sum is over all mess- in C, .. If E and M are not In corresponding residue classe P g (Mr - 0/ Similarly it can be shown that the a posterio: probabilities of the different keys are the same in value these values ere associated with different keys when a di? ent key is used. The same set of values of P E (K) have un< gone a permute t ion among the keys. Thus we haVe the resul . Theorem 5: In a pure system the a posteriori probability of various messeges P~(MJ are independent of t key that is chosen* The a posteriori prob; bilities of the keys P E (K) are the same in vai but undergo a permutation with a different ke\ choice. Roughly we may say that any key choice leads tc the sf.me cryptanalytic problem in a pure cipher. Since tfc different keys all result in cryptograms in the same resid class this means that all cryptograms in the same residue class nre cryptanalytically equivalent — they lead to the s a posteriori probabilities of messages and, epart from a permutr.tion, the same probabilities of keys. As an example of this, simple substitution wit: all keys equally likely is e pure cipher- The residue cle corresponding to a giTen cryptogram E is the set of all Cryptograms that may be obtained from E by ope'rstions T < T In this case T . T k ~l is itself' a substitution and henoe an. substitution oil E gives another member of the same residue class.. Thus if the cryptogram is 49 ' |'|| | I ■ E'ICPPGCf d then E 1 »RDHHGDSN Eg»ABCCDBEF etc. ore in the same residue class. It is obvious in this case, that these cryptograms are essentially equivalent. AIT that is of importance in a simple substitution with random key is the pattern of letter repetitions, the actur letters being dummy variables * , Indeed vie might dispense with them entirely indicating the pattern of repetitions in E as follows:* - This notation describes the residue class but eliminates e information as to the specific member of the class* Thus leaves precisely that information which is cryptanalytical pertinent. This is related to one method of attacking sic substitution ciphers — the method of pattern words. In the Caesar type cipher only the first difft ences mod 26 of the cryptogram are significant. Two crypt grams with the sane Ae, are in the same residue class. Or. breaks this cipher by the simple process of writing down t 26 members of the message residue class and picking out th one which makes sense. The Vigenere of period d with rpndom key is a'r. example of a pure cipher. Here the message residue class consists of all sequences with the same first differences letters separated by distance d as the cryptogram. For d m 3 the residue class is defined by m l " m 4 " e l ~ e 4 m 2 m 5 " e 2 " e 5 ~ n 6 e 5 " 6 6 r m 4 ' "7 " 6 4 " e 7 ( | 1 ^Suggested by a notation used by Quine in Symbolic Logic* - 50 - where E - e^, e , ... is the cryptogram and m^, m^, ... is any M in the corresponding residue class. In the transposition cipher of period d with random key, the residue class consists of all arrangements of the e. in which no e, is moved out of its block of length d, and any two e. at a distance d remain at this distance. This is used in brisking these ciphers as follows. The cryptogram is written in successive blocks of length d, one under another as belo-w (d «= 5): e l e 2 e 3 4 e 5 e 6 e 7 e 8 e 10 e ll e 12 • • • • • • * » The columns are then cut apart and ^rearranged to make sense. When the columns are cut apart, the only information remaining is the residue class of the cryptogram. Theorem 6: If T is pure then Tj_ T* T « T where ' T i T j are eny tv, ° tronsform '' 'tions of T. J Conversely if this is true for any Tj in a system T then T is pure. The first part of this theorem is obvious from the definition of a pure system. To prove the second part we note first that if T, T." 1 T * T then T, T.-l T is a transforma- l j 1 j s tion of T. It remains to show th p t all keys are equiprob^ble . We have T - E P T and s s *s i j s s *s s the term in the left hand sum with s • j yields The only term in Tj on the right is Since all co- efficients rrc non negative it follows that x The same argument holds with i and $ interchanged and consequently p j c P l and T is pure. Thus the condition th^t T, T. -1 T - T might be used ~s an - lti.rn- tive definition of a J pure system. - 51 - The property of purity in e system is connected vtit. v idempotence. Thus consider the system S ■ T T' where T is pure. We have T i T j" 1 T s V 1 ' T i V 1 T r V 1 " T i V 1 so th"t the transformations of S are the same ~s those of S, ■and since both S and S are pure we hrve S - S 2 Theorem 7: If T is pure S » T I' is pure and S 2 * S. An endomorphic system T which satisfies the conditi' T i T j * T s ^ but not necessrril y with all key probabilities equal) can be shown to approach a pure cipher on raising to a high power, namely the one with the same trensf ormr-tions , but with all probabilities equalized.. In fact the probabilities for Tn+1 are derived from those for T^ by a Markoff process, of a special type due to the. group property* This special type always approaches the limit of equalized probabilities. This seme argument applies more generally.' We have Theorem 8: Let T be any endomorphic cipher. If T 11 approaches any limit at ^11, which will necessarily occur if all the transformations of T n lie in a finite set (no matter how large n) and the transf arffln tions of T include the identity then this limit will be r pure cipher. As m example consider the cipher R = p T + q S where T is transposition with random key and S substitution with random key. We have S 2 = S T ST ■ T S - and hence any product of T* s and S?s suoh asTST-TTSS reduces to S T. Thus R n - p n T + q n S + (1 - p n 1 q D ) S T - 52 - Ls n 10 the first two terms approach zero find Lin R n » S T n -*• xi The concepts of pure ^nd mixed lnngu-.gts nnd. pu and mixed ciphers have an application in practical cryptana ysis, if we interpret them somewhat loosely. When a crypt- 1 grapher starts work on a cryptogram, his first job is to de termine the original language. Approximately then he is de termining the pure component of the general language space L > p x L x + p 2 L z + ... ♦ p n L n where say is English, L £ German, etc. Of course these e not pure but the different components of them are fairly cl together in statistical structure. The second thing a cryptographer d~>es is to de termine the "type" of cipher that was used — usually this is about the same as finding the pure component in the general cipher system R • P x S + p 2 T + p 3 Y + ... where 3 say is simple substitution, T is transposition, etc A Vigenere V of unknown period is not a pure cipher but the decomposition V * Pi V l + P 2 V 2 + *3 V 3 + — where V, is of period i, is into puro components (if all ke are equally likely for any period). In solving e Vigenere the first problem is to determine the period. The same is true in transposition. The reason for this initial isolation of pure «of neerly pure language and cipher is that only then or.n a simple meaningful stntistical analysis be carried out. — 16. Involutory Systems If every trsnsf orrar: tioh in n systen T is its y. inverse, i.e. If T i T i - 1 for every i, the system will be called involutory. Such systems are important pr r cticrlly since the enciphering r deciphering operations -re then identical. This l«vds t* sinplifiod instructions to cryptographic clerks in manual oper^ti^n, or in mechanical cases the sane machine with t sane key setting nay be usee" for bath ~perctions. Examples: In simple substitution we nay limit our trans- formations to those in which when letter 9 is the substitute for <p, 9 is the substitute for .toother example is the Beaufort cipher- If T is involutory, so is the system whose ope tions are : ^-.;>r : ■ - . * ' . •" ■ .*• 1 S S T i s i \ - ,* since ■ ; . 17. Similar rnf Weekly Similar Systems Two secrecy systems R and S will be s-^id to b< similar if there exists ' transf orn- tion /. having en. invc A- J- such th^t r R ■ A S This means thrt enciphering with R is the same ps enciphe with S ' n.Q then per- ting on the result with the transf or tion A. If wo write Rw S to mean R is similar to S then is clear thrt R»S implies S^R, Also R« S p nd S» T impl R~T and finally R~R. These are sun-prized in mathenati terminology by spying that similarity is an equivalence relation. * * '/ * The cryptographic significance of similarity i. if R~S then R and S are equivalent from the cryptanaly point of view. Indeed if a cryptanalyst intercepts a cry gram in system N S he can transform it to one in system R b; merely applying the transformation A to it# /. cryptogram system R is transformed to one in S by applying v Arl f If : and S ar6 applied to the same language or message space, there is f one-to-one correspondence between the rc-sultin cryptograms. Corresponding. cryptograms give the same dis tribution of r posteriori probabilities for all messages. If ~ne h r s r art|p3 of broking the system R the: any system S similar to R en be broken by reducing to R through application if the -perrti^n A.' This is r device thct is frequently used in pr^ctic~l cryptrn" lysis . Examples: As r trivial cx^mjle, simple substitution v.herc the substitutes ^re n^t letters but ^rbitr^ry symbols is similar t? simple substitution using letter substitutes. A second ex r mple is the Cresar rnd the reversed C^es^r type ciphers. The letter is sometimes broken by first trans- forming into a Cresar type. The V-igenere, Beaufort rn? Variant Beaufort are p11 similar, •when the key is random. The "autokey" cipher primed with the key K, K g ... K, is similar to • Vigenere type with the key .'alternately added an' subtracted Lod 86» The %tf nsformrtion A. in this case is th^t of "deciphering" the. autokey with . a series of d A*s for the priming key.-. - * '•-•.'■». .■■>:. .v.... Tv,- systems R fn? S are w eakly similar if there exist two transformations A an<* B having inverse A'l end B-l with R - A S B This me^ns ttrt system R is the same ~s applying first B t^ the language, then S, mc 1 finally A. This rcl^tim is r lso nn equivalence relation. Finding a method of solution f-^r system R with lrngunge L is equivalent t^ finding a solution for S with language B L. ■ We may note that if R is pure an' S is weekly similar t' R then S is pure. This follows from R.i Rj- 1 R k - R t ■ A Si B Kfl « B-- 1 Sj 1 A" 1 \ - A s k B v/ where we assume corresponding transformations in R on" S t-i h~ve the srme subscripts. Hence - 55 - -i -1 R. R - * R. - A S, S. S. B " R i °j .r 1 r^ b" 1 3j anc S is therefore pure* * - t t •. . PART II Theoretical Secrecy Introduction We now consider problems connected with the "theorecti- cal secrecy" of a system. How immune is a system to cryptanaly- sis when the eryptanalyst has unlimited time and manpower avail- able for the analysis of cryptograms? Does a cryptogram have a unique solution (even though it may require an impractical amount of work to find It) and if not how many reasonable solutions does it have? How much text in a given system must be intercepted be- fore the solution becomes unique? Are there systems which never become- unique in solution no matter how much- enciphered text is Intercepted? Are there systems for which no Information whatever is given to the enemy no matter how much text is intercepted? 18 Perfect Secrecy Let us suppose the possible messages are finite in number Mi..* M n and have a priori probabilities P{Mi),..., P(M n ), and that these are enciphered into the possible crypto- grams Ei ,..E m by E - Ti M . The eryptanalyst intercepts a particular E and can then calculate the a posteriori probabilities for the various messages, Pe(M) • IT is natural to define perfect secrecy by the oondition that for all E, the a_ posteriori probabilities are equal to the a priori probabilities independently of the .values of these, In~~tnis case, intercepting the message has given the eryptanalyst no information** Any action of his whioh depends on the Information contained in the cryptogram cannot be altered, for all of his probabilities as to what the cryptogram contains remain unchanged*- f On the other hand, if the condition Is not satisfied there will exist situations' in which the enemy has cer- tain a_ priori probabilities, and certain key snd messages are chosen where the enemy^ probabilities do .change* This in turn may effect his actions and thus perfect secrecy -has not been . . , — «•.' *» ^ «• «• — «► «• — -* a» _ ■» f •» — a» . a* •» *A purist might object that the enemy has obtained a bit of infor- mation in that he knows a messsge was sent. This may be answered by kJ having among the messages a "blank" corresponding to "no mes- sage t fl If no message is originated the blank is enciphered and sent as a cryptogram,, Then even this modicum of remaining infor- mation is eliminated, obtained. Hence the definition given is necessarily required by our ideas of what perfect secrecy should mean. A necessary and sufficient condition for perfect sec- recy can be found as follows.- We have by Bayes' theorem t> P(M) ^ (E) P-r M - ■ * P(E) > ■ and this must equal P(M) for perfect secrecy, Hence either P(M) * 0, a solution that must be excluded since we demand the equality independent of the values of P(M) , or ; ; - ' ) ; -,p(e) . ■ for every M and E» Conversely if ^(E) - P(E) then and we have perfect secrecy* Thus we have the result: ■ . Theorem- 9; A necessary and sufficient condition for perfect secrecy is that - P M (E) - P(E) for' all M and E. That is Pjj(E) must be independent of K, The probability of all keys that transform M« into a given crypto- gram E is equal to that of all keys transforming if* into the same E. Now there must be as many E's as there are M T s, since fixing i, Tj gives a one-to-one correspondence between all the M T s and some of the E»s . For perfect secrecy Pvr(E) « P(E) ^ for any of these E»s and any M. ■ Hence there is at least -one key transforming any M into any of these E*e, But all the keys from a fixed M:to different E's must be different, and therefore the' number of different keys, is at least as great as the number of M»s* It is' possible to obtain' perfect, secrecy with no more, »s one shows by the following example* . I,et the be numbered 1 to n and. the E^ the same > and using n keys let _ - ^ ■* >:?:**,:■ <■ * *f 'f'*t'%«.. .: . ■ . •' •' rj**?* ' ' - where s ■ i +>j (Mod nj . • In this^case we see that P~(M) » — » P<E) and we have perfect secrecy.' An example is shown with n « 5. • - 58 - ooaam^mj These perfect systems in which the number of crypt grams, the number of messages r and the number of keys are al equal are characterized by the properties that (1) each M is connected to each E by exactly one line, (2) all keys are eq likely. Thus the three matrix representations of the system "latin squares". We have then concealed completely an amount of inf tion at most log n with a size of key log n. This is the fi example of a general principle which we will often see, that there is a limit to what can obtain with a given key size— t amount of uncertainty we can introduce into the solution of cryptogram cannot be greater than the key size* Here we hav concealed all the information but the ke*y size is as large a message space* . We now consider the case where lM| is infinite; in suppose the message generated as an unending sequence of let by a Markoff process* The maximum rate of this source is R c It is clear from our results above that no finite key will g perfect secrecy. We suppose then that the key source genere key also in the same manner, i.e. as an infinite sequence or bols with a mean rate R K . Suppose that only a certain lengt key Ljc is needed" to encipher and decipher a length of mes Theorem 10: For perfect secrecy (when the a priori proba- bilities of various messages can be anything) , for large L Ro L M < % and the rate (R R * e) is asymptotically sufficient. This may be provSd by the same method (essentially the finite case. This case is realized by the Vernam systet These results have been deduced on the basis of un or arbitrary a. priori probabilities for the messages* The k required for perfect secrecy depends then on the total numbe possible me s sages j 6? on the maximum rate Bo ° f the' message source. * - •'. " ~* ' - one would suspect that if the message space has fi known statistics; so that it has a definite mean rate R of generating information, th<3n the amount of key needed could reduced in an average sense in just this ratio JL» end this Ro indeed true. In fact the message can be passed through a ti ducer which transforms it into a normal form and reduces the - 59 - expected length in just this ratio, and then a Vernem syst- may be applied to the result. Evidently the amount of key per letter of message is statistically reduced, by a factor R — and in this case tho key source and information source H just matched--an alternative of key conceals an alternativ information. It is easily seen also, by the methods used : "Information* paper that this is the best that can be done. K Theorem 11; 'Perfect secrecy (omitting the condition of independence of a_ priori probabilities) for . a source with fixed statistics and a, rate R of generating Information can be' 'achieved with a key source which generates at the rate (R + e) where W and Lv are message „ • - _ «• ** L K and key lengths^ which correspond. ; A rate less than R iM. is insufficient.: % ' - Perfect secrecy systems have a place in the prac- picture — they may be used either where the greatest import is attached to complete secrecy — e.g. correspondence betwe. the highest levels of command, or in cases where the numbe: possible messages is small. Thus, to take an extreme exam; if only two messages "yes" or "no n were anticipated a perft •system would be in order, with perhaps the transformation - K M A B yes - 1 no 1 The disadvantage of perfect systems for large co: pondence systems is,' of course, the equivalent amount of ke that must be sent. In succeeding sections we consider what be achieved with smaller key size, in particular with fini- keys, 19. Equivocation Let us suppose that's simple substitution' cipher been used on English text and that we Intercept a certain t N letters, of the enciphered text. For N fairly large, mo: than say 50 letters, there is nearly always a unique solut: the cipher; i.e. a single good English sequence which tram - 60 - SpjffffifflffiCI&Li into the intercepted materiel by a simple substitution. W: smaller N, however, the chance of more than one solution is greater; with N * 15 there will generally be quite a numbe: possible fragments of text that would fit, while with N = E good frecteon (of the order of 1/8) of all reasonable Engl: sequences of that length are possible, since there is seldc more than one repeated letter in the 8. With N «* 1 any let is clearly possible and has the same a posteriori probabili as Its a priori probability,. For one^letter the system is feet, ~ This happens generally with solvable ciphers. Be any material is intercepted we can imagine the a^ priori pre bill ties attached to the various possible messages, and a Is to the various keys. As material Ik Intercepted, the crypt lyst calculates the a posteriori probabilities; and as N ir the probabilities *>f*""certa in messages • increase * and of most decrease, until finally only one is left ^ which has a probe nearly one, while the total probability of all others is ne zero, - : r. This calculation can ectually be carried out for simple systems. Table 1 shows the a . posteriori probabiliti for a Caesar type cipher applied to English text, with, the chosen at random from the 26 possibilities. To enable the of standard letter digram and trigram frequency tables the has been started at a random point (by opening e book and p a pencil down at random on the page). The messege selectee this way begins "creases to • , ," starting inside the wore creases. If the message were to start with the beginning c sentence a different set of probabilities must be used, cor ponding to the frequencies of letters, digram , etc,, at t beginning of sentences, ./.„.■ The Caesar with random key is a pure cipher and t particular key chosen does not affect the a posteriori prot bilitles; To determine these we need mereTy list the possi decipherments by all keys and calculate their a priori prob bilitles* The a posteriori probabilities are Ehese divided their sum; These possible decipherments are found by the standard process of "running down the alphabet" from the me and are listed at the left* These form the residue olass f the message. For one intercepted letter the a posteriori p bilitles ere equal to the a_ priori probabilltres for letter, are shown in the' column- headed N f s 1, For two intercepted letters the probabilities are those for digram adjusted t sum to unity and these are shown in the column N * E. - 6i - aaffflft Table 1 A Posteriori Probabilities for a Caesar Type Cryptogr Decipherments N = 1 N - 2 N - 3 N - 4 CREAS • 032 .015 •111 .55 DSJBT , .036 .068 ETGCU ,123 .170 / • F U H D V , .023 ,023 G V I E W . .016 «■ H W J F X ,051 - .015, • I X K G Y ,072 t-i JYLHZ ' .001 K Z M I A . .005 L A N J B . .040 . ,072 . .250 .01 MBOKC , .020 .019 . .022 . *.oi N C P L D . ,072 4 ,066 D % M E . .079 V .034 P E R N F , ,,023 , .085 . #438 a n . -#43 Q F S G . „002 RGTPH . .060 .013 SHUQI • .066 .064 . .005 T I V R J .096 .272 .166 U J W S K . .030 V K X T L . .009 W L Y U M . .020 .008 .005 X M Z V N .002 Y.N A WO .019 .006 Z B X P .001 A P C Y Q .080 . .066 B Q D Z R .016 Q, (digits) -1.248 #999 . .602 .340 Trigram frequencies have also been tabulated and .these are in column N *.3. For four and five letter sequences probe , ties were obtained by multiplication from trigram t re quenc since approximately " ,\ '.. Vv^w.-'-- •v- • p{ijki) --p(tjk) P Jk (^) ■ **- ■ -> . --. ■ t - 62 rriUlTTWiTTTi'iT Note that at three letters the field has narrowe to four messages of fairly high probability, the others bei snail in comparison. At four there are two possibilities five just one, the correct decipherment. In principle this could be carried out with any but unless the key is very small the number ° f jg""^ so large that the work involved prohibits the actual caicu This set of a posteriori probabilities describes the cryptanelyst's knowledge of the message and key g re due becomes P more precise as enciphered material is obtained description, however; is much too involved and difficult t obtain for our purposes. What is desired is a simplified caption of this approach to uniqueness of the possible sc We will first define a -quantity Q called the "ec vocation" which measures in an average way ^.^J* 8 "*; the solution, or How far it is from unicity. Suppose tha ; celtl in cryptogram E ,of N letters has been intercepted. . c?yptaSa^st III in principle calculate the a posteriori , Mlities by the use of Bayes' theorem..- Thus P^M) « P(M) P M (E)/P(E) Similarly the probabilities for various keys, after E has intercepted are given by P2(K) - PlK) Pk(E)/?(E) The equivocation of the message should measure way how -spread out these probabilities P E (M) are; how far are from being concentrated at one message. In Xio* with General principles of measuring such dispersion, as in th Srhnioe uncertainty, and generating Information, we de He Equivocation or tU messfge when E has been intercept ... ■ ■■ ....... •v^-v^-. , ■ ^ (M) m j. pg(M) log' Pe(M) M the summation being over ell P 05 * 1 * 1 ^* 3 !f ^ven^ 1 * 1 " 1 equivocation in key when E in intercepted Is given *y q(K) - - T P E (K) log Pe(K) K The same general arguments used to justify our me of information rate may be used here, to justify the equivc measure. We note that equivocation zero requires that one sage (or key) have probability one, all others zero. Equi\ is measured in the same units as information, i.e. alterna' digits, etc., according as the logarithmic base is 2, 10, c In fact, equivocation is almost identical with information, difference being one of point of view. In information we £ the notion of how much freedom we have in choosing one eler from a set with certain probabilities — in equivocation we t size the uncertainty of our knowledge of what wss chosen wt probabilities have certain values. Although any one number can hardly be expected tc cribe the set P E (M) perfectly for all purposes, I think the defined here does as well as any single statistic can* Sor. the theorems which follow indicate the mathematical "naturt of this particular measure. . The values of equivocation for the Caesar type c: gram considered above have been calculated and are given ir last row of Table 1. This is the Q, for both key and messaf the two being equal in this case. The definitions given above involve 'a particular cepted E, and ore the equivocations for that intercepted c: gram. We wish, however, to find a measure of the equivocf for the system as a whole, which will describe this progre: toward uniqueness as N increases in an average sort of way. To do this we form a weighted average of the equivocations each particular intercepted message E, weighting in accord; with the probabilities of getting the E in question. This be called the mean equivocation of the system, or where ttu is no chance of confusion with the narrower equivocation fc particular E, we abbreviate to merely the equivocation. T: mean equivocation of message is Q(M) - - T P(E) Pe(M) log Pe(M) / M,E v the summation being over all M and all E. Since P(E) Pg(M) - P(E, M) the probability of getting both E and M, we can write this PM(E) Q(M) - - T P(M,E) log P E (M) - - 2 P(M,E) log P(M) P(E) - 64 - tuiiiii 1 1 milium m Similarly Q(K) - - Z P(K,E) log P(K) -f— . Either of these mean equivocations is a theoretics measure of the secrecy value of the system. We ssy theoreti since even when the equivocation is zero, which corresponds no uncertainty as to the message , it may require. e tremendou. amount of labor to locate the particular message where the p bility is one. It might, for example, be necessary to try e possible K in succession until one was found that trensforme the intercepted E into reasonable text in the language. Thu system would be practically very good, but theoretically sol The equivocation may be said to measure the degree of secrec when the cryptanalyst has unlimited time and energy. The equivocation is, of course, a function of N, t number of letters intercepted. The functions Q(K,N) and Q,(M will be called the equivocation characteristic* of the syste. Th3 following data will be helpful in forming a pi of what small values of equivocation represent. An equivocation of .1 alternative would result if 9 times in 10 there was no uncertainty as to M, the tenth ti: two M*s were equally probable, or (2) if every time there we two possibilities one with probability .983, the other with probability .017, or (3) if 99 times in 100 there W3S no unc tainty, the 100th tine 1000 equally likely possibilities. An equivocation of ,01 would result <1) if every t there were two possibilities one with probability .999, the with probability .001, or (2) if 99 times in 100 there is no certainty, the other time two equally likely possibilities, ; (3) if 999 times in 1000 there is no uncertainty, the other t: 6 or 7 equally likely possibilities* * ■ v -.■■-* - - '* x 20, Properties of ^Equivocation Equivocation may be shown to have a number of inte: esting properties* most of which fit Into our intuitive pict of how such a quantity should behave* We may first show, by example, the somewhat surprising fact, that after a cryptena. has intercepted certain special- 'E*a, his equivocation as to ! or message may be greater then before he intercepted anythin, The Intercepted material has increased his ignorance of what happenedl Suppose there are only two messages and Mg wit; a priori probabilities p end q f and that a simple substituti 65 is used according to the following table, the two keys K± and K 2 also having the e_ priori probabilities- p and q. Kl K 2 E 2 El M 2 E 2 Before the interception, the equivocation of both key and message is - (p log p ♦ q log q), which is less than one alternative if p 4 q. If p » q there is little uncertainty as to which message and key will be chosen, Mi and Now suppose he intercepts The a posteriori probabilities of both keys and both messages are easiTy seen to be l/Z. and hence the equivocation for both key and message is one alternative, greater than before.' On the other hand, if Eg is intercepted, the more probable event, the equivo- cation for both key and message decreases, more than enough to compensate for the other increase, and the mean equivocation of both key and message decreases. This is a general property of all secrecy systems. The mean equivocation of key, Qk( n ) iB a non-increas- ing function of N. The mean equivocation of the first A letters of the message is a non-increasing function of the number N which have been intercepted. If N letters have been intercepted, the equivocation of the first N letters of message is less than or equal to that of the key. These may be written Theorem 12: Qm(m) < Qm(N) Qu(N) < S > N M > N The qualification regarding A letters in the second result of the theorem is so that the equivocation will not be calculated with respect to the amount of message that has been intercepted^ If it iB; the message equivocation may lend usually does) increase for a timej due merely to the fact that more letters stand for a larger possible range of messages* The results of the theorem are what we might hope from a good measure of equivocation, since we would hardly expect to be worse off on the average after intercepting material than before-. The fact that they can be proved gives additional justification to our definition* - 66 - The results of this theorem can be proved by a sub- stitution in the property 6 of section 1» Thus to prove the first or second we have for any chance events A and B Q,(B) > Q A (B) If we identify B with the key (knowing the first S letters of cryptogram) and A with the remaining N - S letters we obtain the first result. Similarly identifying B with the message gives the second result. The last result follows from Q(M) < Q(K) * Qg(M) . \ and the fact that Q K (M) * since K uniquely determines M. Theorem 13: Q,(K) - JM| ~ }E| + jK| Q(M) « fM | - |E|.+ |Hf where - - I P(M,E) log . M,E We have q(k) - - r E,K P(K) P K (E) P(E) Hence 'Q(K) - - 2 P(K) P K (E) log P{K) - r P(K) Pk(E) log, P K fE) , + r P(K) P K iE) log P(E) Summing the first term on E gives - 1 P{K) log P(K) ~ In the second term PviE) is P(M) t the unique M that gives E with key K. Summing on K then gives - T P(M) log P(M) - |M|. The third term is 2 P(E) log P(E) - |EU - 67 - «iJ!JlfiuJlL 1 The. second equation in the theorem is proved by the same method. Q(M) - - Z P(E) Pe(M) log Pe(M) - - I ?(«) *(» log F(M) P(EJ « - Z ?(M) F M (E) log P(M) - Z P(K) Pm(E). log P M (E) ' + Z P(M) P M (E) log P(E) ' : - |M| - |S| - T P(M) P M (E) log, Pm(EJ ' The last term here aay.be interpreted as follows* Group to- gether 811 the different keys that transform a fixed M into the same E, giving the total probability to the group, which - v . will be %(E) . The last term is the average size of this group space weighted according to the probability P(M) of choosing among the groups leading out of M. In case no group contains more than one element (at any rate no group from a M with P(M) > then |H| * |K| and q(K) - Q,(M) . This is also clear since there is then a one-to-one correspondence between the keys and messages for any given E. From the first equation of the theorem we may conclude that Q(K) - |K| in case |M| - fEj . This latter occurs in par- ticular if all L''s ere equally likely and all E»s equally likely and there are the Same number of each. It is easy to see that this is the case with a language in which every letter is equally likely and independent, ond when almost any of the simple ciphers are used. If we have a product system S s T R, it is to be ex- pected that the second enciphering process does not decrease the equivocation of message and thiq Is actually/true as C8n be shown by the methods used /above* If T end R commute either may be considered as being the first and hence in this" case . the equivocation with S is not less than the' maximum for the, two systems R and T, Simple examples' show that this does not ' hold necessarily if R and T do" not commute, \\ Theorem 14; The equivocation in message of a product system S » T R is not less than that when only R is used. If T R - R T it is not less than the maximum of those for R and T alone. 68 - If we hove a product of several systems R S T U, we con of course extend this, to sey that the equivocation of R S T U is not less than that of S T U, which is not less than that for T U, etc There is no similar theorer.: for the inner product since for example if T and R are inverse processes their inner product is the identity and the resulting equivocation zero. Suppose we have a system T which can be written as a weighted sum of several systems R, S, U T - p x R + PgS + ♦ + PmU I Pi - 1 1 .\- - ■ and that systems R, S, U have equivocation characteristics Qi, Qe %l* • . ' ■ ;' ' Theorem 15: The equivocation Q of a weighted sum of systems is bounded by the inequalities 2 PiQi < Q < 2 PiQi - I Pi log Pi These are best limits possible. The Q»s may refer either to key or to message, . The upper limit is achieved, for example, in strongly ideal systems (to be described later) where the decomposition is into the simple transformations of the system. The lower limit is achieved if ell the systems R, S, .. t) U go to com- pletely different cryptogram spaces. This theorem is also proved by the general inequalities governing equivocation, Q A (B) < Q(B) < Q(A) ♦ Q A (B). We Identify A with the particular system being used and B with the key or message, • There Is a similar theorem for weighted sums of languages, ■ v "■ Theorem 16: Suppose a system can be applied to lenguages • , ••* ^i# L 2 ». •♦•> L m Qn<l has equivocation cha,rac- , teristics Q^.* Q-2» ^m* When °PP lied t0 the weighted sum ? Pi Li, the equivocation Q, is bounded by 2 Pi Qi £ Q £ 1 Pi^i " 1 Pi log p i - 69 - These limits are the best possible end the equivocations i question can be either for key or message. The proof here is essentially the 'same as for th preceding case. An important consequence of the result Q(K) « iKf + |Ml - JE| is the following,' , . ..«'. *~ • Theorem 17;* In any closed system, or any system where -. <. " the total number of possible cryptograms is . ' ; equal, to the number of possible messages" • of N letters Q(K) > \K] - < fM 1 - }M|) •* |K] • ' L v * i " : where M » log H, with H the number of pos- - - , ' : : ■>-.■.'•'.;-. sible messages of N letters." Dm is the total redundancy for N letters,' This is true since |M | > [Ef, the equality hold only if all cryptograms are equally likely. 1 The theorem s that in a closed system the key is determined only by the dundancy of the language - the equivocation can decrease o es the redundancy comes into action and at no greater rate Suppose we have c pure system and let the differ residue clesses of nassoges be Ci., C% r C r , The co ponding set of residue classes of cryptograms is C^,.. The probability of each E in is the sane: ; ' Where is the number' of different messages in Thus ; : , - «-z p(Ci) log' - ' P(E) « 2i££i E e C, 70 - Substituting in our equation for Q, we obtain: Theorem 18: For a pure cipher Q - \K\ + (Hj ♦ I P ( Ci ) log This result can be used to compute Q, in many cases of inte From the analytic point of view pure ciphers hcv simple structure. If a cryptogram is intercepted its resi class gives the complete information obtained by the crypt Within the residue class the system is perfect - each mess in the class has an a posteriori probability equal to its a priori probability? For large N. beyond the unicity poi There will usually only be one M in the class of reasonabl probability., and the -problem is to determine this M. The theorem oh equivocation of pure' ciphers can : altered to show this. We have iptCi) log ZllLL « z p(ci) log p(c i ) -i p(Ci) log ^- <?i V1 + Z ViCi) log k - Z PtCiJ log P(Ci) + Q M (K) - |K| Hence end P(C< ) Q (K) - |K| + |M| + Z P{C, ) log i- " |*| ♦ Q M (K) + I P(Ci) log P(Ci) Q <M) '■' - |M| - [-Z P(C t ) log HCil 1 The equivocation of message is the equivocation of message the cryptogram was intercepted less the information imparte specification of its residue class, ; . * " : ■ SI. Key Appearance Characteristic Suppose the cryptanalyst has N letters of message and N letters of the equivalent cryptogram. Then he can ca3 cul.ate the a posteriori probabilities of the various keys or the basis of this information, and if N is small there will remain a certain equivocation of key* For example in simple substitution, knowing 20 letters of message and cryptogram does not disclose the entire key, since only about 12 letter of the 26 will be represented, • Thus there is a residual equivocation of log (26-12);, if exactly 12 letters appear. We define the mean residual key equivocation as *•• . , / : . •• „•• ; , r ;- : " when P(E,M) is the a priori probability of having message M and cryptogram E, and Pg^fK) is the conditional probability of K with S and M given* This may be written by obvious arguments (assuming all keys equally likely) %(K)- % P(M,K) log X (M,K) where X (M,K) is the number of different keys from M in para with K, that is which go to the same E as K. For simple substitution let P* be the probability that a received cryptogram of N letters has X different lett appearing in it. Then %(K) * £ P x log (26 - x)j Approximately log lbgV ^26A) , r The bracketed terms vary slowly wifcfc atfd it P&) is fairly well concentrated, we may take the bracket' out" replacing X by its mean value Xjv This gives,- after recombination - 72 Q M (K) » log (26 - This residual key equivocation is shown for simple substi- tution on English in Fig; 12, It measures how much of the key has not been used in enciphering N letters of text on the average, Theorem 19: QjX) - Q(M) ♦ ft^K) That is, the total key equivocation (when we don't know the message) is the sum of the message equivocation and the re- sidual key equivocation; lie;; the equivocation there would be in the key if we did know the message; This follows from • the fact that the key uniquely determines the message properties 4 and 5 in Section X» ■ * . 22. Equivocation for Simple Substitution on an Independent ., tetter Language . • ■ We will now calculate the mean equivocation in key or message when simple substitution is applied to a two letter language, probabilities p and q for and 1, with successive letters independent; We have % " % " - 2P E P JS lK) log P S lK) The probability that E contains exactly s O's in a particular permutation is 1 , s n N-s . s N-s, g- (P q • ♦ 0. P ) and the a posteriori probabilities of the identity and in- king substitutions are respectively ver ting p a q»"» p 1 ^ 8 q 9 hM m 177^ ♦ , 8 p^ 8 ) V? * EFT* ♦ >*; ■ There are („) terms for each 8 and hence 73 This may be written Q(N) = -Z p S q^ 3 [s log p + (N-s) log qj , / s N— s s N-s i - log (p a q p^ a ) - -N [p log p * q log q] *■ Z (*) p S q 1 ^ 8 log (p S q lN " s q £ « MR + iz < N ) (p S q N ~ S * q S p 1 *" 3 ) log (p S q N - s * q S p 1 ^ For p = 1/3, q = 2/3, and for p * 1/8, q - 7/8, Q, has beer culated and is shown in Fig. 13, Now assume the language contains r different letters chosen independently and with probabilities p, , p £****» p r* By approximately the same argument we have 1 2 T> "l Q(N) - -Z {s x ...8 T ) p x p 2 ..*p r r log -r± S l ! 3 P. S * _ Pi "»P r S l f Zp •••P T1 s, ... s r a r\ ± T p where Z s. » N and Z is over all permutations of 1, 8, ... for a, tw v Hence, by obvious • transformations Q(N) m * £ Z U r 5UjJ 2 Pa^.t.P^ 32, log Z P a Sl .... 3 1*" * 3 r P ' P where R - -£ p^^ log p, , . In particular, QIO) - ± ri log r| - log r: - JkI 3(1) = R ♦ pj- r log <r-l): *» R + log (r-l')l This checks the evident answer for 3(1) - the f: symbol has equivocation R and the parts of the key not us* add log (r-lJI 23. The Equivocation Characteristic for a "Ran dom" Closec Cipher > [ - In the preceding section we have calculated the equivocation characteristic for a simple substitution appi to an independent letter language- This is about the simj type of cipher and the simplest language structure possibl yet already the formulas are so involved as to be nearly useless. What are we to do with cases of practical intere ^ . say the involved transformations of a fractional transpose tion system applied to English with its extremely complex statistical structure? This complexity- itself suggests tfc method of approach* Sufficiently complicated problems can frequently be solved statistically, \ In order to do this y define the notion of a "random" cipher.. ^ ■ We suppose that the possible messages of length can be divided into two groups, one group of high and fair uniform probability, while the total probability in the second group is small. This is usually possible in inform tion theory if the messages have any reasonable length. I the total number of messages be H » 2 where R is the maximum rate and N the number of letters-, high probability group will contain about RN 3 = 2 where R is the statistical rate. The deciphering operation defin&s a function M~ i which can be thought of as a series of lines, k for each E going back to various M' s. By a random cipher we will mear one in which all keys are equally likely and the k lines from any E go back to random M»s.. The equivocation' in key is given by - - ' 1 " Q(K) - 2 P(E) P E (K) log P E (K) The probability of exactly m lines going back to the high probability group is - 75 - ^nil HUB P (k) (s) m n s) k ' m (m) (IT) 11 " I) If a cryptogram with m lines going to high probability mes- sages is intercepted, the equivocation is log m. The prob: ity of intercepting such a cryptogram is easily seen to be mH Sic ' Hence the mean equivocation is ■ * ■ & A ui ill* (1 -§ ,k " m ■ l0 s »' We wish to find an approximation^© this for large k. If t expected value of m, namely m * § k is »1, the variation c log m over the range where the binomial distribution assume large values will be small and we oar* replace log nf by log This then comes out of the summation leaving the expected e Hence in this condition Q - log | k - log S - log H + log k - Ik! - ImJ + 1m I - IkI - N D. If m is small compared to the large k, the binomial distri- bution can be approximated by a Poisson distribution.* (k) m k-m e" X X m \ m S * lm) ^ H ml a Hence Q - £ e S £r m log m •* 2 ■ -X co * m. - e £ ~r lo€ (»♦!)' *Fry, Probability and Its Engineering Uses, p. 214, - 76 - When we write (m ♦ 1) for m. This may.be used in the regi< where X is near unity. For X « 1 the only important term the series is m - 1; omitting the others -X <} « e \ log S » X log 2 - 2 lKl Z' m log 2 Thus <i IK) starts off at IkI , and decreases line with slope -D out to the neighborhood of N»lKl/D. After a short transition region, Q, follows an exponential witn ha life" distance l/D if D is in alternatives per letter. If is in digits per letter l/D is the distance for a decrease by a factor of 10. The benavior is shown in Fig, 14 with the approximating, curves. By a similar argument given in the appendix, the equivocation of message can be calculated. It is Q(M) - lid 1 * B Q N for B N« Q(K)*1kI-DN CUM) - Q,(K) B Q N» <4(K) Q,(M) - %{K\ - 9 (N) B.(N) " Q,(K) where <p(N) is the function of Fig. 14, with N scale reduce by a factor of D . Q(M) rises linearly with slope B unt R o this line interests the q(K) line. After a rounded transl it follows Q(K) down. Most ciphers have an equivocation characteristic of this general type, approaching zero rather sharply. We wiU call the number of letters required for near unicity solution the unicity distance, 24,. Application to Standard Ciphers . The characteristic derived for the random cipher may be expected to apply approximately in many cases, pro- viaine some precautions are taken and certain corrections are mfde. ThTmain points to be observed are the f ollowin 1. We assumed in deriving the random characteristic that the possible decipherments of a cryptogram are a random selection from the possible message This is not true in- actual oases, but becomes mc nearly true as the complexity of the operations used in the enciphering process and the complex! of the language structure increase. The more cc ' plicated the type pf cipher, the more it should follow the random characteristic. In the case c - 77 - a transposition cipher it is clear that letter frequencies are preserved. This means that the possible decipherments are chosen from a more limited group - not the entire message space - and the formula should be changed. In place of R one uses Ri the rate for independent letters but with the regular frequencies. This changes the redundancy from D - r q - r * .707 digits/letter D f " Rjl - R * •538 digits/letter and the equivocation reduoes more slowly. In some other cases a definite tendency toward re- turning the decipherments to high probability messages can be seen. If there is no clear tendency of this sort, and the system is fairly complicated, and the language a- natural one . (with its very complex statistical structure) - then it Is reasonable to make the random cipher assumption. In many cases the key does not all appear as soon as It might. For example in simple sub- stitution one must wait for a long time to find all letters of the alphabet represented in the message and thus deduce the complete key. The message becomes unique long before this point. Obviously our random assumption falls down in such a case, since all the different keys which differ only in the letters not yet appearing lead back to the same message, and are not ran- domly distributed. This error is easily cor- rected by the use of the key appearanoe character Istio. One uses at a particular N, the amount of key that may be expected at that point in the formula for , There are certain "end effects* 1 due to the defini starting of the message which produce a discrepar from the random characteristics. If we take a random starting point in English text the first letter (when .we do not observe the preceding lsttars) hasa possibility of being any letter w: to - 78 - the ordinary letter probabilities. The next letter is more completely specified since we then have digram frequencies. This decrease in choice value continues for some time. The effect of this on the curve is that the straigh line part is displaced, and approached by a curve depending on how much the statistical structure of the language is spread out over adjacent letters. As a first approximation the curve can be corrected by shifting the line • over to the half redundancy point - i.e., the number of letters where the language redundancy is half its final value* If account is taken of these three effects, rea sonable estimates of the equivocation characteristic and unicity point can be made. The calculation can be done graphically as indicated in Figs. 15 and 16. One draws t. key appearance characteristic TKl - ^A^-) *&• total r dundanoy curve ImJ -ImI {which fa usually sufficiently well represented by the line' NR) ♦ The difference between these out to the neighborhood of their intersection is For the simple substitution the characteristic is shown in Fig. 17. In so far as experimental checks could be ca. ried out they fit this curve very well. For example, the unicity point, at about 27 letters, oan be shown experi- mentally to lie between the limits 22 and 30. With 30 le one nearly always has a unique solution to a cryptogram o: this type and with 22 it is usually easy to find a number them. With transposition of period d, the unicity poi. occurs at about 1.5 d log d/c. This also checks fairly w experimentally* Note that in this case Q, is defined on. for integral multiples of d. ' With the Vigenere the unicity point will occur t about 2d + 2 letters, and this too is about right. The Vigenere characteristic with the same key size as simple i stitution will be approximately as shown in Fig. 3.8, The Vigenere, £layf air and Fractibnal cases are more likely tc follow the theoretical formulas for random ciphers than simple substitution and transposition,. The reason for th: is that they are more complex and give better .mixing char- acteristics to the messages on which they operate* ■-- ■ ' i ' The mixed alphabet Vigenere (each of d alphabet mixed independently and used sequentially) has a key size. '4i- .. 1 . 2 3 4 5 1.25 1.00 .60 .34 1.25 .98 .54 ,15, .03 - 79 - IkI - d log 26V- 26.3 d and its unicity point should be at about 53 d ♦ 2 letters These conclusions can also be put to a rough ex perimental test with the Caesar type cipher. In the part cular cryptogram analyzed in Table I, section 19, the fun tion QlN) has been calculated and is given below, togethe •with the values for a random cipher. N . ♦ Q {observed) 1.41 Q (calculated) 1.41 The agreement is seen to be quite good, especia when we remember that the observed 9, should actually be t average of many different cryptograms, and that D for the larger values of ,M is only roughly estimated. * It appears then that the random cipher analysis can be used to estimate equivocation characteristics and the unicity distance for the ordinary types of ciphers. 25. Solving Systems Using Only N-Gram Structure . , The preceding analysis can also be applied to c where the cryptanalyst is assumed to know or use only a limited knowledge of the structure of the language. If n data about the language other than the digram frequencies is used in solving cryptograms the equivocation curves ma: be computed, using for the redundancy curve that obtained from D„ alone. This curve lies below the curve for all r< dundancy and the unicity point will therefore be moved to a larger N. Fig, 19 shows the Q curves for simple substi- tution on normal English when the cryptanalyst uses only digram structures.- 26 * . Validity of a Cryptogram Solution . ■ * • The equivocation formulas are relevant to quest: which sometimes arise in cryptographio work regarding the validity of an alleged solution to a cryptogram.. In the history of cryptography one finds many cryptograms, or possible cryptograms/ where clever analysts have found a ^solution*!* It involved,* however, sucty a complex process the material was 'so scanty, that the question arose as to - 80 whether the cryptanalyst had "read a solution" into the cryptogram. See for example the Bacon-Shakespeare ciphers and the "Roger Bacon" manuscript.* In general we may say that if a proposed system and key solves a system for a length of material considers greater than the unicity distance the solution is trust- worthy. If the material is of the same order or shorter ; _ than the unicity distance the solution is highly suspicioi Thifleffeot of redundancy in gradually producing unique solution to a cipher can be thought of in another \ which is helpful. The redundancy is essentially a series conditions on the letters of the message, which insure tte it be statistically reasonable. These consistency conditi produce corresponding consistency conditions in the crypto gram. The key gives a certain amount of freedom to the cryptogram, but as more and more letters are intercepted, the consistency conditions use up the freedom allowed by t key. Eventually there is only one message and key which satisfy all the conditions and we have a unique solution. In the random cipher the consistency conditions are in a sense "orthogonal" to the "grain of the key", and have the full effect in eliminating messages and keys as rapidly at possible. This is the usual case. However, by proper de- sign it is possible to "line up" the redundancy of the language with the "grain of the key" in such a way that tt. consistency conditions are automatically satisfied and Q, does not approach zero. These "ideal" systems are of such a nature that the transformations T. all induce the same probabilities in the E space. Ideal characteristics are shown in Fig. 20. 27. Ideal Secrecy Systems . We have seen that *perf ect secrecy requires an infinite amount of key* With a finite key size, the equiv cation of key and message generally approach zero, but not necessarily so* In fact It is possible for Q(K) to remain constant at its Initial, value IX). Then, ho matter how much material . is intercepted, there is not a unique soluti but many of comparable, probability. We will define an "ideal" system as one in which (UK) and Q(M) do not approa zero as-* oo, A "strongly ideal" system is one in which Q(K) .remains constant at IKU *See Fletcher Pratt, "Secret and Urgent" m 81 - CO] r ."V 5,- I .1 1 * V An example is a simple substitution on an artifi language in which all letter probabilities are the same and each letter independently chosen. It is clear that Q(K) » and Q(M) rises linearly along a line of slope Rq until it strikes the line Q(K), after which it remains constant at this value. With natural languages it is in general possible to approximate the ideal characteristic - the unicity point can be made to occur for as large N as is desired. The complexity of the system needed usually goes up rapidly as we attempt to do this, however*. It is not always possible to actually attain the ideal characteristic with any. system of finite complexity*. To approximate the ideal equivocation, one may first operate on the message with a transducer which reduce: to the normal form « i.e., with all redundancies removed. After this almost any simple ciphering system - substitutio: transposition, Vigenere etc*, id satisfactory* The more elaborate the transducer and the nearer the output is to normal form, the more closely will the secrecy system ap- proximate the ideal characteristic. Theorem 20: A necessa: and sufficient condition that T be strongly ideal is that for any two keys T T -1 T - is a moasure preserving trans- 1 J formation of fi^ into itself* ' This is true since the a posteriori probability of each key is equal to its a priori probability if and onl; if this condition is satisfied, 28* Examples of Ideal Socrecy Systems . Suppose our language consists of n sequence of letters all chosen independently and with oqual probability Then the redundancy is zero, |M: o l ■ |M"j , and from Theorem 11 Q(K) - |K|. We obtain the result Theorem 21? If all letters aro equally likely and independc any closed oipher is strongly ideal* The equivocation of message will rise along the key appearance characteristic |K| - which will usuall: approach |k|, although in some casos it does' not*. In the cases of N-gram substitution,, transposition', Vigenere and variations, fractional, otc, wo havo strongly ideal system; for this simple language with Q(M) — |K| as oo.. - 82 - If the letters are independent but are not all equally probable, the transposition cipher characteristics remain essentially the same. The asymptotic equivocations of both key and message are clearly IKl. In the substitution cipher they will be less. If all the letter probabilities are different, then the asymptotic equivocations of both key and message are zero. The letters can all eventually be de- termined by frequency count (apart from certain exceptional sequences of zero measure)* Suppose now that there are ? letters with probabilities, ' , . ... . , P X - P 2 < P 3 < P 4 - P 5 - P 6 < P 9 In this case we cannot separate p, from pg or p 4 p= and p fi from each other, but the different unequal probability groups can be eventually separated. If all substitutions are a priori equally likely, there will be an asymptotic uncertainty among ■ ■• 2i x 3I equally likely (a posteriori ) keys. Hence, the symptotic Q, be ■ log 21 3: In general it is clear that the asymptotic equivocation with a substitution where the different substitutions are equally likely is $ m (M) ■ (K) - log H vhere H Is the order of the group of substitutions on the letter probabilities p^ ... p fl which leave this set invariant. More generally we can consider an arbitrary pure sy stem T and a pure language L, . Suppose that T operates > only "locally" on the letters of U in the sense that the nth letter of cryptogram depends only on n and a certain finite number of the letters of M in the neighborhood of the nth one: ■ ■ - ' itU - -"*»-" e a - f lK.njm^ m^,. . t.m^p)'. i Then we can show that there is a certain subgroup of the t formations T ^-1 T which are probability preserving in the language L. In the limiting cases these would consist of the identity or of the whole group ™ -1™ T i V Theorem B2: Under these conditions the asymptotic equivoc of key is the logarithm of the order of this subgroup of . measure preserving transformations. An ideal secTecy system suffers from a number 01 disadvantages. - i '■ '.. " '*. . ** \ .. *•• 1* The system must be' closely matched to the langue This requires an extensive study of the structur of the language by the designer. Also a change statistical structure or a selection from the se of possible messages as in the case of probable words (words expected in this particular cryptog renders the system vulnerable to analysis. 2. The structure of natural languages is extremely complicated, and this reflects in a complexity c the transformations required to reduce them to the normal form. Tbus any machine to perform th operation must necessarily be quite involved, at least in the direction of information storage, since a "dictionary" of magnitude greater than • that of an ordinary dictionary is to be expected 3. In general, reduction of a natural language to a normal "form introduces a bad propagation of erro. characteristic. Error in transmission of a sing letter produces a region of changes near it of size comparable to the length of statistical effects in the original language,. £9* Multiple Substitute Ideal Systems. . * There is another way of obtaining ideal or nearl; ,, ideal characteristics using multi-valued secrecy systems. Suppose our language contains only three letters with - probabilities 1/8, 3/8 and 4/8, and that successive letter: 84 - ■ in a message are chosen independently. Let there be 1 sub- stitute for the first letter, 3 for the second and 4 for the third, and choose at random among the possible substi- tutes for a letter. It is clear that this system is ideal, If the different probabilities are incommeasurabl'e, we canr exactly achieve the ideal behavior, but can approximate it, by using enough substitutes, as closely as desired* If the language is more complex, with transition probabilities, this general method can still be used, but i becomes more involved* Suppose the choice of a letter de- pends only on the two preceding letters, not on any more remote part of the message. The transition probabilities p, (k) completely desoribe the statistical structure of the language. We supply substitutes for k When it follows i, J proportion to p^ 1*1* Of all our m substitutes mp^tk) represent k after the pair i r J, As before one chooses from the possible substitutes for a letter at random. The crypt gram will then be a random sequenoe of the m substitute letters As an example, suppose the p^j) are the only statistics of the language and the values are given by iNJ 12 3 1 2 .1 .3 ,6 1 2 .5 ,3 ,9 .1 With 10 substitutes 0, 1, 2, ,,,,9 we construct a substitu table assigning substitutes (chosen randomly) in proportion to the frequencies* The following is a typical key. i 1 I 3 L 2 7 0,5 # 6 1,2,3,4,8,9 3,9 0,4,8 j .\ • » • * 0,1,2,3,5,6,7,8,9 4 If a 3 follows a E in the message we substitute one of 0, for it, the choice being random. A second table must be s< plied for the first letter of the message, corresponding t unconditional probabilities of the three letters, • Although of theoretical interest it is doubtful whether such systems would be of much use practically beca- . of their complexity and message expansion in ordinary case However j, the first approximation to such systems, matching letter frequencies, has b$en used in ciphers and is standa; practice in codes (where one matches word frequencies). 30 . Equivocation Rate." ■ ■ .< We now return briefly to cases where the key is not finite, but is supplied constantly, as in the Vernam s- and the running key cipher In such cases we may define equivocation "rates'*. One ©onsldere the equivocation Q(N) of the message when N letters have been intercepted, The equivocation rate for the message Is defined as the limit (assuming it exists): Lim" Q(N) N-oo ~ Q • The rate for equivocation of key would be defined similarl; using the equivocation in the part of the key that has beei used only, but of course these two are the same. There art results for these parameters analagous to those obtained with finite key cases. Let R» be the mean rate of using key, ■ Theorem 23: ... * '■• Q* < R» In case the equality holds we have the analogue of ideal systems where the complete information of the key goes intc equivocation. If R* > IB the rate of the-message source, we can obtain perfect secreoy - In fact we may define per- fect secrecy as the case in which Q* * H« , In the random pase we have the analogous result V - R» - D, • 31, Further Remarks on^ Equivocation and^ Redundancy . We have taken the redundancy of "normal English" to be about ,7 digits per letter of 50^ of R Q . This is on the assumption that word divisions were omitted. It is at approximate figure based on statistical structure of the order of lengths of perhaps 8 letters, and assumes the te?. to be of an ordinary type, such as newspaper writing, literary work, etc. Various methods of calculating re<- dundancy have been devised and will be described in the memorandum on information mentioned in the intro- duction. We may note here two methods of roughly estimati this number which are of cryptographic interest. A running key cipher is a Vernam type system whe in place of a random sequence of letters the key is a meaningful text. Now it is known that running key ciphers can usually be solved uniquely. .This shows that English can be reduced by a factor of two to one and implies a redundancy of at least oOjfa. This figure cannot , be reduced very much, however, for a number of reasons, unless long range "meaning" structure of English .is considered* , . The running key cipher can be easily improved to lead to ciphering systems which could not be solved withou the key.. If one uses in place of one English text, about 4 different texts as key, adding them all to the message, a sufficient amount of key has been introduced to produce a high positive equivocation rate. Another method would be to use say every 10th letter of the text as key. The intermediate letters are omitted and cannot be used at any other point of the message, This has the same effect, sine the mean rate for these spaced letters must be over .8 H o . These methods might be useful for spies or diplor . who could use books or magazines for the key source. A second way of showing the high redundancy of English is to delete all vowels from a passage. In. general it is possible to fill them in again uniquely and .recover the original, without knowing it in advance. ■ As the vowels constitute about 40j£ of the text this jmta a limit on the redundancy. ' Aotually there is considerable redundancy left the various letter and digram frequencies being far tram uniform, c '■• . ■ v v, f - ~--: x m-. ■ - - . \ ■ ■•. -v • • "• • - - This suggests a simple,, way of greatly improving almost any simple ciphering: system * - Jirst delete all vowel or as much of the message ss possible without running the risk of multiple solutions, -and than encipher the residue. Since this reduces the redundancy by a factor of perhaps 3 or 4 to 1, the unicity~ point will be moved out by this ■ - 87 - CONK factor. This is one way of approaching ideal systems - using the decipherer's knowledge of English as part of the deciphering system, **** w WA 6Iie Two extremes of redundancy in English prose are represented by Basic English and Joyce's "Einnegans Wake", The basic English vocabulary consists of only 850 words and a rough estimate puts the redundancy at about 70*. A cipher applied to this sort of text would rapidly approa unicity. Joyce, on the other hand, would be relatively ea ifJSfi*??^??* ' fl ? aI1 red ^ancy is disclosed by the dif- ficulty in filling incorrectly even a single missing lett, pom "Jinnegan8 : Wake" f What the numerical value is, would be difficult to determine > it varies widely throughout the COOK, ■ - : * . '"'<-./* The mathematical extremes of redundancy, and 1C can be constructed in artificial languages. .In the first we have e.g.. a single possible message. iden- tically and QIK) ih, the random cipher case declines as rapidly as possible i.e.., as rapidly as ohe sends informa- tion on the system,, v In .the other extreme all letter sequer are equally likely, and any closed ciphering system is idee We may refer here to a memorandum by Nyquist (Enciphering-Effect of Redundancy in "Language, May 30, 1944 in which some questions of the type we are considering here are discussed. i *— 32. Distribution of Equivocation . A more complete description of a secrecy system applied to a language than is afforded by the equivocation characteristics can be found by giving the distribution of equivocation. For N intercepted letters we consider the fraction of cryptograms for which Q (for these particu- lar E's, not the mean OJ lies between certain limits. This gives a density distribution function • . P(Q,Nh d^ f 01, ^^Probability that, for N letters Q lies between the limits Q and Q + dft, . The mean equivocation we have previous studied is the mean -of ^this distribution. .; Q.dCi. The function P(Q,N), can- be thought of as plottedalong a third dimension, normal .to the paper, on the Q^N plane. If the language is pure, with a small influence « range (com- pared to K) and the cipher is pure the function P(Q,N) will 88 - *P0!ff'lU.iJfIAL usually be a ridge in this plane whose highest point follows approximately the mean at least until near the unicLty point. • In this case, or when the conditions are nearly verified, the mean Q curve gives a reasonably complete pictv of the system, • On the other hand, if the language is not pure, but made up of a set of pure components.. L • Z %\ , ■ ' ' ■ '• having different equivocation curves with the system, say Qi. Qj>, .... Q then the total Q distribution will usually be made up of a series of Ridges* 1 There will be one for each 1 weighted in accordance with its p*y The mean, equivocation characteristic will be a line somSwhere in the midst of thes ridges and may not give a- very complete picture of the sit- uation. This is shown in Pig* '21 # ,« , ' ~ A similar effect occurs if the, system is not pure but made up of several systems with different ft curves. There is then a series of ridges in the PU,N) plot, and the mean Q, strikes an average which ,may lie between ridges and be a very improbable value of Q, for a particular crypto- gram. These effects are illustrated in Fig. -22. The effect of mixing pure languages which are near to one another in statistical structure is to increase the width of the ridge. Near the unicity point this tends to raise the mean equivocation, since equivocation cannot become negative and the spreading is chiefly in the positive direction. We expect therefore, that in this region the calculations based on the random cipher should be somewhat low. I - 89 - PART III , Practical Secrecy 33. The v . T ork Characteristic After the unicity point has been passed there wil usually be a unique solution to the cryptogram. The proble of isolating this single solution of high probability is th- problem of cryptanalysis .. In the region before the unicity point we mav say that the problem of cryptanalysis is that isolating all the possible solutions of high probability (c pared to the remainder) and determining their various probe ities. . . i ... / ** -.'* " - . ... >.; :; ' 7 V-- - . Although it is always possible in. principle, to de- f. • mine these solutions <ty trial of each ^possible key for e'xa; different enciphering systems show a wide variation in the s of work required. The average amount of work to determine key for a cryptogram of N letters- T "(N) measured say in man . may be called the work characteristic of the system. This averag. is taken over all messages and all keys with their ; propriate probabilities. ; , For a simple substitution on Snglish the work and equivocation characteristics would be somewhat as shown in Fig.. 23.- The dotted portion of the curve is where there ar numerous possible solutions and these must all be determine In the solid portion .after the unicity point only one solut. exists in general, but if only the minimum necessary data e given a gr^at deal of work must be done to isolate it. As more material is used thj work rapidly decreases toward som asymptotic value - where the additional data no longer redu-, the labor. , I , This is the work characteristic for the key. It : * \ '. clear that after the unicity point this function can never : • *■ 1 creese. There is also a work characteristic: fdr the messag the average emount of work to determine th;e;raessago (or all ' reasonable messages) . . This will i, ih ordinary cases , be bel or et any rate not far above the work characteristic for th key, out to fairly large W. since generally If 'the key is d termined it is easy to find IS by the deciphering transformer For very largo N, howevdr, this function will incroa-se due merely to the lebor of deciphering the large amount of inte: cepted material. . - - 90 Essentially the behavior s^ ^>*^ Mo , exnected with any type of seer -c y quired, however c.pproaches zero. The seal ^ofv men nou *^ g> _ ven ^ will differ greatly with diffor*nt ^yp Qr cocipound th . Q curves are about *gw. ^ k 5 y si2i3 would have a muc Vigenere, for example, with th. Sect/ristic. * good practic: better (U./nuoh ^f^f t tf"(H)curve remains sufficie: secrecy system is one l4t.rs one expects to transmit ly high out to the number of ™ uct S ai i y carrying out with the key, -to g^tv^t tStuch an extent that the inform: the solution,' or to delay it to su i tion is obsolete. * • • . -V ^•^ wi Uxan,ider>n the following ^^ S b /^ C ?L- . keeping the* Unction fW^o, - ^^^type of "problem as » cllv zero, * This is essential/ - h fttle of wits.*. ' In design- ■ is always'the .case when we ^^g^ amount of work ing a good r cipher we must m ™ unougn merely to thf ene** r nust do ^ t^ ; k it.^ ^ ** f twullysis work - be sure none 01 tho St. nd.ra iU break the system we must show thct no method ^tev.r f Q$ m ny sy stems < easily. This U 5l! t b 3i SS known methods of solutio: they were designed to resist ai w fl;3tno d which applied to but had r structure leading to n ;*> n r™ hfcVd b3on many disclosed werknjssos of th„ir own. - -v flasiKii is essentially on in a field . • . v.- e „r« that a system which is not vife3* 1 -„-,- -"*""*." »tTh »nrv of Games"., The s: te^'^^^ Neumann ^^^^^Sr cnl crjptanalyst can be th ,.tlori between the ci P ner -/t?nfi atructure; a zero-sum two p • - ' : ^ 'lt ss^gome" of » very feLT 'Lt ^ "novas*. The < ^ game wi%. comp^^^ Information,^ ana jv. cryptan: I %. Cign#chooses a system for ^^^^-^^od-of analysis is informed of. this choic. and cno ~ rjqu ired to bre . - The "value" of the P^.J ^ " na thod cll0Sjn ...' r. cryptogram in the system cy •(1) *fe can study the possible methods of solution available to tha cryptanalyst and attempt to describe them in suffici^-n' gen:.rc.l t^rns to cover iny methods h^ might use. fc'j th^n con- struct our system to resist this "general" method of solution. (2) \U may construct our ciphers in such a way that breaking i is equivalent to (or requires at some point in the process) tl solution of some problem known to be Laborious. Thus, if we could show thf.t solving t system requires at least as much wor as solving a system of simultaneous equations in a largo numb^ of unknown, of a complex type, then we will have e lower bounc of sorts for the work characteristic. ' . "i-- r ■ •"' . •„•> ' The next three sections ore aimed at these general problems. It is difficult to define the pertinent ideas in- volved with sufficient precision to obtain results in the forrr. of mathematical theorems/ but it is believed that the conclusi in the form of general principles, are correct. 34 . - Generalities on the Solution of Cryptograms . After the unicity distance has been exceeded in int c cepted materiel, any system can be solved in principle by mor_- trying each possible key until the unique solution is obtained i.e., a deciphered message which "makes sense" in ~l*-r. A simpl calculation shows that this method of solution (which we may c complete trial nnd error ) is totally impractical except when t key is absurdly smalTT Suppose, for example, we ht-vo a key of 261 possibili or about 26.3 digits, the samu size as in simple substitution English. This is, by any significant measure, a small key. I can be written on a sm?:ll slip of paper, or memorized in a few- minutes. It could be registered on 27 switches each having to; positions or on 68 two position switches'. Suppose further, to give the cryptanalystl every poss- ible* advantage, thtt he constructs a electronic device to try keys &t the rate, of one each microsecond ( perhaps ^eutomati call' selecting from the~rosults by a X 2 test for statistical signi-' fionnce). He nr:y expect to reach the right key about half way through, and after nn elapsed time of about ->> 2 x 60 c x 24 X 365 x 10 26~ • ' ' ' ->' — - r - 3 x X0 X * years <P w Ami. « TfiK ~ mo '/ ft In other words, even with a smtll key compl-te trial and error will nev^r be used in solving cryptograms, except in the trivial case where the key is extremely small, e.g., the caeser with only 26 possibilities, or 1.4 digits. The tri snd error which is used so commonly in cryptograph"; is of different sort, or is augmented by other means. If one he. secrecy system which required complete trial and error it be extremely safe.- Such a system would result, it appears the original messages, all say of .1000 letters, weru a ran selection of 2 RN from the set of all 2 RoN sequences of 1 letters. If any of the simple ciphers w«rc applied to the it seems that little improvement over complete trial and «. would by possible. The methods actually- used often involve a great -x.pt trirl and error, but in a different way- First, the tr ;,.;V ' _ ' progress from more probable to less probable hypotheses, a. * second,, each trial disposes of a large group of keys,. not % ■ . single one. Thus the key space may be 'divided into say 10 subsets, each containing about the srjne number of keys. B. . at most 10 trials on= determines which subset is the corrtsc one. This subset is then divided into several secondary s sets end the process repeated.. Y/lth the same key size (K • 261 - 2 x 10 2 °) we would expect about 26 x 5 or 130 t: as compared to 10 26 by complete trial and error. The poss: bility of choosing the most likely of th~ subsets first fo test would improve this result evefi more. If the division: were into two compartments (the b^st way) only 90 trials w. be required. Wiore; s compljt^ trie! and error requires tr: to the order of the number of k-ys, this subdividing trial and error requires only trials to th~ order of the key siz in r.lternetives. This remains true even when the different keys h different probabilities. The proper procedure then to min. the expected number of trials is to divide the key space ix subsets of equiprobr bility , Yftien the proper subset is det. t .. , " . mined, this is again subdivided into equi probability subset ; . : If this process can bo continued the number of trials expec when each division is into two subsets will be * *- • . r-v-.-" h- ki • - •• y ' - ■-» • * v . ... _ . log 2 . ,■ . ? y r ' *- -r*v . v jf jfcch test has S possible results and each of t fc v; corresponds to the key being in one of S equiprobabilitf ~su rr^-. .then ., ,. .... lT^T.?^f t&ft- ."■ • 1 | V i ■ ... . ' Vyr,. - • * • • • n - ILL ■ : • 7 ,; v.. - C- \;. ' - . ' log S / trials will bo expected. The intuitive aifnif icunco of thes^ results should be noted. In %h4 two compartment tuSt with jquiprobibility, each test yields one altornr.tiVw of informa- tion to the key. If the subsets hcv^ very different prob- abilities as in testing t. single key in complete trial and er only i snail amount of information is obtained froa th~ test. This with 26: equiproble keys, a tost of on„ vields only ■ 261-1 lnrr 26t -1 . 1 . m 1 -25 or about 10 alternatives of information. Dividing into S equiprobability subsets m^ximiz^s the information obtained fr each trial at log S, and the expected nuriber of trials is the total information to be obtained, that is th~ key size, divid by this amount , The question here is similar to various coin weigh- ing problems th; t he Vo been circulated recently. A typical example is the following: It is known that one coin in 27 is counterfeit, and slightly lighter than the rest. A chemists balance is available r,nd the counterfeit coin is to be isolat by a series of weighings, '"hi t is thu lee st number of weigh- ings to do this? The correct answer is 3, obtained by first dividing the coins into three groups of 9 uach.. Two Of th-.se are compered on the b: Irnce. The three possible rjsults de- termine the set of 9 containing the counterfeit.. This s^t is then divided into 5 subsets of 3 er.ch and the process continu The set of coins corresponds to th^ set of keys, the counturf coin to the correct key, and the weighing procedure to & trial or test. >. This method of solution is feasible only if the key space can be divided into e small number of subsets, with s simple method of determining to which subset the correct key belongs.. Started in another way. It is possible to solve for the key bit by bit.. One does not need to assume a complete kt in order to apply a consistency test and determine if the as- sumption is justified - an assumption on a "part of the key (or as to whether the key is in some large section of the key space) can bo tested. This is one of the greatest weaknesses of most ciph ing systems. For example, in simple substitution, an assumpt. on e single letter can be checked against its frequency, vari of contact, doubles or reversals, etc.. In determining a sing- letter the key space is reduced by 1.4 digits from th. origin 26. The same effect is seen in all th~ elementary typos of ciDhers. In the VigenJr^, th- assumption of tvvo or thre^ letters of the key is easily chock-d by deciphering at other points with this fragment and seeing whether clear emerges* The compound Vigene'ro is much butter from this point of view, if we assume a fairly large number of component periods, pro- ducing a repetition rate larger than will be intercepted. Her-j as many key letters ere used in enciphering each letter as there ere periods - although this is only a fraction of the entire keyi at JLeast e fair number of letters must be assumed before a consistency, check can be applied* . v ••. *•> Our first conclusion then, regarding practical small key cipher design, is that a considerable amount of key should be used' in enciphering each small element of the message. 35. Statistical Uethods ' i - ,. It is possible to solve many kinds of ciphers by statistical analysis. Consider again simple substitution. Tha first thing a cryptographer do^s with an intercepted cryptogram is to make a frequency count. If the cryptogram contains say 200 letters it is safe to assume that few, if any, letters are out of their frequency groups, this being a division into 4 sets of well defined frequency limits. The log of the number of keys within this limitation may be calculated as log 21 91 .9! 61 «= 14.28 and the simple frequency count thus reduces the key uncertainty by 12 digits, a tremendous gain. ■ In general, e statistical attack proceeds as follows. A certain statistic is measured on the intercepted cryptogram 2. This statistic is such that for all r easonable K it assumes about the sane value, Sr, the value depending only on the par- ti culnr" key 25^ that wrs used. The value thus obtained serves to limit the possible keys» to those which would give values of S in the neighborhood of that observed. .A statistic whicb , does not depend on K or which varies as much with Mas with K is not' of velue in limiting" K» Thus in transposition ciphers , the frequency, count of letters gives no information about K - every K loaves tB^s* statistic the sane. Hence one can make no use of a frequency count in breaking transposition ciphers. Ilore precisely one can ascribe a " solving power " to c given statistic S» For er.ch valuu of S there will be a conditional equivocation of the key Qg(K), the equivocation when S has its particular value and that is all that is kn concerning the key. The weighted mean of these values £P(S) Qs(K) • gives the mean equivocation of the key y hen S is known, F being the: c priori probability of the pcrticular value S. key size IK I less this aean equivocation measures the "sol- power" of S, ; >vpr In a strongly ideal cipher all statistics of the togram are independent of the particular key used. This i: the. measure preserving property -of TiTiZ-Von the a space o Tj-lT k on the space mentioned abovS. -~ • There are good and poor, statist ic's, just as ther good and poor nethods of trial and. error. Indeed the tri:.; error testing of hypothesis Jj a type of statistic, i-nd wh. yiB said above regarding the .best types of trials holds ge: - "A good statistic for solving a system must have th~ follow" properties: 1. It -must bo simple to measure. 2. It nust depend more on the key then on the nesse t if it is meant to solve for the key. The veriati c with K should not mask its vrriation with K. 3. The values of the statistic that can be "resolved' in spite of. the "fuzziness" produced by variation in II should divide the key space into a number of subsets of comparable probability, with the static tic specifying the one in which the correct key lies. The statistic should give us sizable infor- . nation about the key,, not a tiny fraction of an - alternative. . • ' - -" -4* ...The infonaation.it gives nust be simple and usable ." • . - : Thus the subsets In which t bo statistic locates th v^key rxust be of .*L simple nature in ths^key spuce. : '- *> r< _ ' : iv '.. *' n^-ifHfcv'' . - irf A . , Frequency count for simple substitution is an : ,«$$opi£ uof 't. very good statistics* _ ' ^ ^Vv^:-. . » .. _ ,^t. ... . .. . - Two methods (other tban >rocouris^'o : ^i%enl' systems suggest themselves for frustrating a statistic^ analysis. These we mcy cf 11 the methods of diffusion and confusion , the method of diffusion th^ statistical structure of R whic: leads to its redund: ncy is "dissip; ted" into long range st: - i.e., into statistic;! structure involving long coabinati - 96 - ?Tfide:;- - of letters in the cryptogram. The effect here is that the must intercept a tremendous amount of material to tie down sturcture, since the structure is evident only in blocks o: small individual probability. Furthermore even when he har ficient material, the analytical work required is much gre? since the redundancy has been diffused over a large number individual statistics. An example of diffusion of statisti is operating on a message m - mi, m 2 , m 3 ..... with a "smoc ing" operation, e^g, > v , s ' v n " s m n + i mod 26 , ■ - - . - -V - • i-1 ' •-r ^K,-/V - , , * " f . w HurlfCf. ■*■•■ ••• • " "' • - * ■ 1 adding s successive letters of the message to get a letter ^One can show that the redundancy of the y sequence is the s as that of the m sequence, but the structure has been dissi Thus the letter frequencies in y will be more nearly equal « in m, the diagram frequencies also mor3 nQapiy f aqual etc, ... - deed any reversible operation which produces -one letter out each letter in and does not have an infinite "memory" has a. output with the sams redundancy as the input. The statisti can never be eliminated without comwession, but they can t spread out* • ..r .' The method of confusion is to make the relation t the simple statistics of 3 and the simple description of K complex and involvid one. In the case of simple substituti was easy to describe the limitation of K imposed by the let frequencies of 3. If the connection is very involved and c fused the enemy can still evaluute a statistic Si say which the key to a region of the key space. This limitation, how is to some complex region R in the soace - folded over many and he has a difficult time mr.king use of it, A second stc S2 limits K still further to Rg, hence it lies in the inter, region R1R2* but this does not help much because it is so d; cult to determine just what 'the intersection is." . i , 'v-v To be more precise lot us .suppose the It ey space he oertcin "natural coordinates* kl,k2, " . k- which he .wishes terminey. .He measure's c set of -'stati sties sijSg^^^s' anc ere sufficients to determine the k^. However, in the method confusion, th* equations connecting thes a sets of variables involved and complex. We have, : s^y, - : '•^• ; ' : ' r ' a ~-~ f n (k 1 ,k 2 ,,.;,k i> ).- s n , - 97 - NTIA1 and all the f. Involve all the k^. The cryptographer must solve this system simultaneously - a difficult job. In the simple "(not confused) cases the functions involve only a small number of the k. - or at least some of these do* One first solves the simpler equations, evaluating some of the ki and substitutes these in the more complicated equations. The conclusion here is that for a good ciphering system steps should be taken either to diffuse or confuse the redundancy (or both)- / / . V '> ■ " ■ - "AV. . 36, The Probable Word Method, . - ' _ , . . One of -the most powerful tools for- breaking ciphers is the . use of prQbable words,. The probable words may-^.-J^.y words or phrases expected in the particular message flue, tq j"; its source, or they may merely be common words or syllables which occur in any text in the language, such r.s the; end, tion, thrt, etc.." v i In genera 1> the probable word method is^used as follows* Assuming a probable word to be at some point in the cleT, the key or r part of the key is determined* This is used to decipher other pp. rts of the cryptogram and provide r consistency test* If the other prr£s come out in clerr, the resumption is justified. There pre few of the classical type ciphers that use a sm^ll key and can resist long under a probable word analysis. Fr^m a considerr tion of this method v.e can frame a test of ciphers v.hich might be called the r e id test. It applies only to ciphers with a small key (less thr.n say 50 digits), applied to natural languages, and not using the ideal method of gaining secrecy. The r C id test is this: Hoy. difficult is it to determine the key or a p^rt of the key knowing n sample of message rnd corresponding cryptogram? Any system in v.hich this is easy cannot be very resistant, for the cryptr.nrlyst can always make use of probable words,- combined with trial and error, Until a consistent solution is obtained- - - . ' v •' .'• ' ■ ■ . : " ri - The conditions. r>n the, size of, the k:y make the amount of trial end error small, and .the' -condition about" ideal systems is necessary, since these automatically give consistency checks- The exist enoe~ of . probable words and v."*;-.-. phrrses is implied, by the condition .of natural language a* . * Conversely, it seems reasonable that if the key is difficult* ? ' to obtain, knowing a text : ahd Its cryptogram, then the system should be strong. • .*"■■' ' - 98 - COlMflENTIAL Note that this requirement by itself is not con- tradictory to the requirements that enciphering and decipher- ing be simple processes. Using functional notation we have for enciphering and for deciphering E = f (K, I) M - g (K, E). Both of these may be simple operations on their arguments without the third equation . - K » h (M, E) • - - ■ - ' • . jg -. ■ ' , . .- being simple* \. ^ v''"" ; - ^ • - . .3 ' :" : : ''5v V'e may also point out In investigating a new type of ciphering system one of the best methods^off attack is to consider hove the key could' be determined' if a sufficient mount of'M and E were given. - With a small key, the work required to solve a system, given a lerge emount of dr.ta, may be expected to be not more thrn a few orders of magnitude greater thpn the work required to obtain the key from a small amount of datr when both U end E nrc known. The same principle of confusion era be (nnd must be used here to crer-te difficulties for the cryptan r lyst. Given K-rn^mg ... m g end E - e, e g e Q the crypt rn^lyst enn set up equations for the different key elements k^ k g (nrmely the encipherings equations)* V; " f g (n^, m 2# •♦♦,m 8 J l£ i# ».* # k r >^ - 99 - ' mm lUiLUTiius — - All is known, we assume, except the k,. Erch of thr s j equa- tions should therefore be complex in the k., and involve ninny of then. Otherwise the enemy en solve tho sicple om and then the more complex ones by substitution. From the point of view of increasing confusion, it is desirr-ble to hive the- f^ involve several n^. t especially if these sre not adjacent and hence less correlated. This introduces the undesirable feature of error propagation., however, for then erch e, will generPlly affect several m, in deciphering, and an error will spread to rll these.. We conclude thet much of the key should be used Ir. an involved manner in obtaining any cryptogram letter from the message to keep the work characteristic high* Further r dependence on several uncorrected m. 4-s desirable,, if some propagation of error can be , tolerated* V/e are led by all three of the rrguments of these sections to consider "mixing transformations,." , 37* Mixing Trensf ormo tions A notion that hr-s proven v^lu^ble in certain branc of probability theory is the concept of a "mixing transforms tion." Suppose we have a probability or measure space 0, ar. measure preserving transformation T of the space into itself i.e., a transformation such that the measure of a transform* region TR is equal to the measure of the„initial region R. The transformation is called mixing if for any function de- fined over the space , end any region R. n^o, J 'til) dP - J dP J f (P) dP. T°R R O ' This means that any initial region of the space R under suc- cessive applications of T is mixed into the entire, space & With uniform density* In general S^R becomes, a region con- sisting of a large number of thin i filaments spread through- out the region..' As n increases the filaments become finer and their density more nearly constant* v • v An example of a mixing transf ormation is shown in Fig. 21. Here measure is identified with Euclidean area. ' The spaoe is the 'triengle and tNp is the print \ units ■ «f distance ab^ve point P providing this does n*>t g^ outside the triangle* When the top of the triangle is renched a point is transferred first to the point directly beneath, and then over to the right en irrational fraction of the base width. If this carries the point beyond the right edge - 100 - the extra distance is mersured from the left edge. -Successive transforms of b square region ere shown in Fig. 21. For \ ve,ry lrrge the squar-. is turned into q uniform grating ot nearly parallel thin strips covering the triangle. A mixing transformation in this precise sense en occur only in a spaee with on infinite number of points, for in a finite point space the transf ormation must be periodic. Speaking loosely, however, we can think of a mixing trans- formation as one which distributes ?ny reasonably cohesive region in the space fairly uniformly over the entire space. If the first region could be described in simple terms, the second would require very complex ones* In the case of y ~ cryptographic interest, the original region is all of a cer- •.; tain simple statistical structure — after the mix the region .< ' .is distributed and the structure diffused and confused* . Go~d mixing transformations are often formed by re- k. & " peated products of two simple non-commutating operations*. . ' See for example the mixing of pastry dough discussed by Hopf.* The dgugh is first rolled out into a thin slab,, then folded over,- then' rolled, and then folded again, etc In a good mixing transformation of a space with natural coordinates X,, X 2 ,. . *. ., Xg the point X. is carried by the transformation into a point Xi, with Xj^ ■*■ f ^ (X^ , Xg , • » » , , Xg ) i " 1 , 2 , * • • ,S and the function* f, are complicated, involving all the variables in a •"sensitive" way. A small variation of any one, X 3 , say, changes all the XI considerably. If X„ passes throug its range of possible variation the point XI traces a long winding path around the space. ... Various methods of mixing applicable to statistical sequences of the type found in natural languages can be -devised. One whioh lo ;ks fairly good is to follow a prelim- inary transposition by a sequence of alternating substitutions . '. ' J end simple linear operations, adding adjaoen^ letters mod 26 * for. example * • r ■. .. ; > Thus . >.-. '. S*Jht r-'i- • • . • • ■ *' . . . -f i SJ rv-. - • ' H - L3ISLT ■ ; . "where T is a transposition, X .is a linear operation* and S is " ' - a substitution. • .. . *E. Hopf, On Causr-lity,. Statistics and Probability, Journol ol . / Mrth* and Physios, V.13, pp. 51-102, 1934. < v i ■a - 101 - 38. Ciphers of the Type 1\HS. 1 1 Suppose that H is r good mixing transformation * can be applied to sequences of letters and thst T. find S. any two simple families of t ran s formations , i.e., two J ciphers 4 which may be the same.. For concreteness we m^y 1 of them as both simple substitutions.. It appears that the cipher THS.will be r very g: ciphering system from the standpoint- of its work chnrnctei In the first place it is clcr on reviewing our arguments statistical methods that no simple statistics will give ir tion about the key - any significant . statistics derived fr must be of e highly involved end very sensitive type - the dundpncy has been both diffused and- confused by the mixing . . A lso probable words led to e complex system of equations Ing all parts of the key {when the mix is -good), which mu .solved simultaneously,. The bad features of such a system v v •• - :* propagation of errors and complexity of operations, both c / • V: which get worse ns the mixing of H gets better. It is interesting to note that if the cipher T i omitted the rempining system is similar to S nn 1 thus no stronger. The enemy merely "unmixes" the cryptogram by , plication of H~l and then solves.. If S is omitted the re- maining system is much stronger th*n T alone if the mix is but still not comparable to THS. The bnslc principle here of simple ciphers sepa by a mixing transformation can of course be extended. For example one could use 'S, ' T k H i S j H 2 R l «$& . . * - -, • . ' . >•*.»'«•• •• >«- ' J Ith two ml xes and three simple ciphers., One can also sim by using the same ciphers, and even the same keys (inner product) ns well as the same fixing transformations* - This • ;*jr.. might well simplify the mechanization of such systems^ " ••/, ■ The mixing transformation which separates the t\ > - N {or more) appearances of the key acts as a kind of . barrier />. ti;; J** enemy — it is easy to oarry a* known element over this barrier but an unknown (the key) does not go easily, «... .... , B y supplying two sets of -unknowns, the key for £ the key for T, and separating them by the mixing transform' H we have "tangled" the unknowns together in r way thrt m«V solution very difficult, Although systems constructed on this principle wpuld be extremely safe they possess one grave disadvantage. If the mix is good then the propagation of errors is b^d. A transmission error of one letter v.ill affect several let- ters on deciphering* . 39. The C omi.o und V ige neVe In the compound Vigenere severcl keys of length d. <3gf ..* f dg are written under the message and added to it modulo 26 to obtain the cryptogram, The 'result is 8 Vigenere with key of special type,' -whose repetition is of period d „ the least oommon multiple of cU, <5„, d g . If we h'-'ve three keys of periods £, 3, 5 thl total period is 50 nod the total key size (2+3+5) x 1,41 - 14,1 digits. The situation is then M ' a l ^ ^ m 4 m 5 m 6 - * H ~\ a 2 a l a E a l k Z K 2 - b x b 2 b 3 b x b 2 b 3 K 3 - C l C 2 C 3 C 4 C 5 C l E *" e l e 2 e 3 e 4 e 5 e 6 ith . e l * ^1 4 a l + b l + c l e 2 " m l * a 2 4 b l 4 c 2 etc« If we assume M nnd E known then, letting »= r m ( s V a. + b,. 0,-h, a, + b 3 ♦ c, - h 5 ' ' " ' ' ■ + *2 * °2 " h 2 Q l 4 b l 4 °2 • V . R l * b 3 * c 3 " h 3 ' R 2 * c 3 ,r W . . . Q 2 * b l 4 °4 " *4 a l + b 3 4 C 4 " b 9 Q l + b 2 + C 5 * h 5 C 2 + b l + C 5 " h 10 These equations are easily solved for the key, although not as easily as in the simple Vigenero or othor sinple ciphers. As the number of constituent periods increases the solution be- comes more involved and time consuming. In any case wo have a system of simultaneous equations each involving S of the s total of B^dj^ unknowns. The unicity point will occur at abou 2B letters and if soveral tines this amount of material is in- tercepted no groat difficulty, should be encountered in breakin the cipher, providing S is not mora than say 6" or 8. With the first 9 primes as periods we have a key size of 100 letters or about 141 digits, the unicity distance is about 200 letters an the key does not repeat for 223,092,870 letters. This systen, although much better than such methods as simple substitution, transposition and simple Vigenero with equivalent key size,' does not utilize the available key fully in making the cryptV analyst work for the solution. The equations only involve 3 of the B key unknowns and those in a simple fashion* The equations easily oombine and reduce to eliminate unknowns. If a large amount of material is available, compared to the unicii distance, particular sets of equations can be combined to eliminate unknowns very easily. The system possesses the inpo: advantage, however, of not expanding errors. One incorrect letter of cryptogram produces one incorrect letter of decipher*, text. .. By relatively simple changes this system could be strengthened considerably. If tho equations for the key elements (with M and E known) could be made into higher degree equations rather than linear ones the difficulty of solution would increase tremendously. This could easily be done in a mechanical device by successive multiplications (Mod 26) of tho key letters according to some prearranged schome, * 40 » Incompatablllty of the Criteria for Good Systems Tho five criteria for good socrccy systems given in seot ion 12 appear to havo a certain inconpatability when ap- - plied to a natural language with its complicated statistical structure. With artificial languages having a simple statis- tical structure it is 'possible to satisfy all requirements ♦simultaneously, by means of the ideal type ciphers. In natural languages It seems that a compromise must bo made and tho valuations balanced against one another with a view toward the particular application. If any one of the five criteria is '"roppec* , the other four crn be s?itisfied fr.irly well, r.s the following examples show. 1. If we omit the first requirement (amount of secrec any simple cipher such os. simple substitution will In the extreme case of omitting this condition com- pletely, no cipher at fll is required end one send. . the clef.ri 2. If the size of the key is not limited the Vernam system can be used. 3. If complexity of operation is not limited., various '•extremely complicated types of enciphering process cen be used* The modified compound Vigenere descr above with. many different periods compounded is f e : satisfactory as an example here, although it falls down somewhat on the key size condition. Ideal syf "and enciphered codes are also frir examples althou t not too good from the propagation of error point o: view. 4i If we omit the propagation of error condition syst - of the type THS would be very good, although sonew: complice tad. 5. If, we allow lr.rge expansion of message, vr.rious sy.- are easily devised where the "correct" message is : with many "incorrect" ones (misinf ormrtlon) . The \ determines which of these is correct. • A rough argument for the incompatibility of the. : conditions may be given as follows. > ' ' ■ ' '* : From condition 5, secrecy systems essentially a s Studied In this paper must be used; i.e., no great use of r. etci Perfect and ideal systems are excluded by condition c rg^0&aMJHr 3 and 4, respectively. The high secrecy required- bj >'^;"^^^flWi«'*th«n*TD<3tf» -£rm a high work characteristic, not from a ^ high equivocation. characteristic , If the key is small, the > '_' ^..^f^-r^: system' simple, and the errors do not propagate^ probable wc methods w 11}. generally solve the system fairly easily, sine we then have a' fairly simple .-system of equations for the ke This" reasoning is too vague to be conclusive, but general idea seems quite reasonable. Perhaps if the varioi. criteria could be given quantitative significance, some sot an exchange equation could be found involving them and giv: the best physically compatible sets of values. The two mo: - t difficult to measure numerically are the complexity of opei tions, end the complexity of statistical structure of the • language . , ■ Appendix 1 Deduction of - I pj log p i It will be shown that the meusure of choice - £ Pi. log Pi is a logical consequence of three quite reasone assumptions about the desired properties of such a measure. The three assumptions are: V (1) There exists a function C(p lt p 2 , p n ) uous in the p^, measuring the amount of "choice" when there n possibilities with probabilities p^ , /•-. ' • .. ' . ' • . <2) , C has the property that If a given choice be broken aown into two successive choices the. total amount of choice, is the weighted sum of the individual choices* . For example, suppose the choice is from 4 possibilities A, B, C with probabilities Yl, .2, «4U . .This can be broken down a preliminary choice hetween.the pair A, B and the pair C, Pair A, B has a total probability .1 + .2 « .3 and pair c, probability .3 + .4 « .7. If pair A, B is chosen a second between A and B must be made with probabilities -* 1 « 1 .1 + .2 Z 4 2 2 V " If P air c » D is chosen a second choice betwee •* * and D must be made with probabilities ^ and * , Thus brok down we have a preliminary amount of choice C (.3, ,7) end of the time a secondary choice of c (± f 2 j while .7 of th time the secondary choice is C (2 . Our condition req that the total choice C (.1, .2, -3, t 4) be the same as the , weighted sum of the different choices when decomposed, weig in accordance with the frequency of occurrence. Thus we re in this case C ,2, .3, .4) « C (.3, .7) + ,3.C (- , - ) ;f^^!-, If .A(n) ? c (I # . i,.!*.*. .» the choice when there are n equally likely possibilities, then A (n) i; monotdnio Increasing in n. i . Theoreaj . Under these three assumptions (•■••» - - • _ C (PI, P2, , Pn). 88 - K£ Pi log pi . where K is a positive constant. - 106 - From condition (2) we can decompose a choice from equall; likely possibilities into a series of m choices each from s equally likely possibilities and obtain A (S 111 ) ■ m A(s) Similarly ;. (tn) - n A (t) We can choose n arbitrarily large and find an m to satisfy S*< t*< S 01 ■* 1 Thus, taking logarithms and dividing by n log S, 5 £ < log t V _m + ± '"log s- . , « j st lSTs.|- < e where* is arbitrarily small* Now from the monotonic property of A(n) A(SP) < A(tn) < AO* + 1) m a(s) < nA(t) < (m + 1) A(S) Hence, dividing by nA(S), m s t ) m 1 n — MS) — n b • - m \ k " - I < 2 e A{t) • -K log t "{BY log S I *~ where K must be positive to setisfy (3), Now suppose we have a choice from n possibilities with comme surable probabilities p^ * where the are integers* can break down a choice f rom £n4 possibilities into a choice f roa possibilities Tvith probabilities pi* »>p n and then,, if the ith was chosen,, a choice from ni with equal probabilitie Using condition 2 again, we f equate the total choice from £ni as computed by two methods K log Eni - c (pi-, , P n ) + K£ Pi log nj_ - 107 - Hence C - K [E pi log I ni " E pi log ni] ■ * K 2 pi log -SL « -K £ Pi log pi If the pi are incommeasureble, they-may be approximated by rationale and the same expression must hold by our continuity, mce and amounts to the choice of a unit of meesure, m /in i - 108 - srfsrr Appendix 2 proof of Theorem 4 Select any message Mi and group together all crypto- grams that can be obtained from Mi by an enciphering operation Ti# Let this class of cryptograms be c{. Group with Mi all Mg that can be obtained from Mi by Tj^TjMlf and call this class Ox* The same ci would" be obtained if we started with any other M in Ci since : " ; .\. •' • - - : ; ■ I i . if, & TsTj^ki % : %iUmm.. ' . ■ .2.,: , ; • . •;. ^^aj^; 1 ^-" Similarly the same Ci would be obtained; :>r > - * Choosing &n M*.flf any exist) not , in Ci.we construct i- G2 and Ce in the same way* .'Thus ^We obtain the residue* classy with properties (1) and (2). Let Mi and M 2 be in Ci and suppose M 2 - T 2 Ti -1 Mi ■ If El is in Ci and oen be obtained from Mi by Ei - \ U x -T p M x - M lr then E l * ^ T 2 T l M 2 " T p T 2 X T l M 2 " ♦ m ' »* " ^ M 2 - ^ «2 Thus each Mi in Ci transforms into Ei by the same number of keys. Similarly each Ei in c{ is obtained from any M in Ci by the same number of keys. It follows that this .number of keys is a divisor k ' , . of the total number of key* and hence we have properties' (3) and . .. .. * ^- o< * . . - •••• • I... ... ,* S6*r* . 4.:? * " ; 1* •. . i ' .— .4 „• 109 - ^nnNTTTPnTiT x 3 Equivocation of Message for Random Cipher As before let Mi ... M s be high probability mes and Ms+l ••«» Mu have zero probability. Let P(mi, m) be probability of just mi lines going from a particular E, s to a particular high probability M, say Mi, with a total lines to all high probability M. Then ... .-..!-■ f t _,„ (k) (m) (i)»l ( s ; i)"i-i» 1(1 . s) The probability of intercepting an E with m lines t bility M's la : ^ > k-n ' ■ - The Q(M) expected can be thought of as contributed to by various Mi .in the high probability group. Thus Ml contri . mi mi , m - log — = ■ —i log — m xue m m 6 mi if there are mi lines to Mi and a total of m to high pro^ M's. The expected Q is then (MM) - a S miEm PCj.m) §j SL log S_ The factor H sums over the various Ei and the S sums ovei different Ml,(i, l> t s ) • Hence, Q(M) - I £ P(mi,m) mi [ log m * log mj the term y i - v.- ■ ,. ■ V E P (mi,m) m x summed on mi* gives the expected mi, when m lines^go to h probability. Mg t 1*©,, m/a, Henoath'e first term is • •* * •»:.-> fx*. ■*'■'; JL £ m P (m) log m * Q(K) m by our previous work. The second term is • JSP (mj., m) mi log mi If the expected mi is «1 this term is small since it vanishes for mi ■ or 1. The expected mi is k/H» Thus beyond this point Q,(M) approaches closely to Q,(K) • The point in question is where JK| • |Mpf - RqN • or IK If the expected »1 the log mi can be taken out as log Hi «* log k/Hi and we have' , - : log =y £ P>j ' ' ^ -log § - }Mo1 r .|K! : ^-r • In' this "region then • - V " '. ' ; " y Q(1C) • |M | - id + d(K) but here Q(K) - ]k| - |M | + : • Jill, and therefore q(M) - |m[ - RN . - ' In the transition region Ei is about 1 and Iff will in ordinary cases be very large. It is admissable then to replace ?(mi; m) by P(mi) , since this will not depend on m to any extent except for values of m of very small probability. Thus we obtain for this region iiU) - - 3 £ p(mi) mi log The "sum has the same "form as our expression for Q{K) but with l/H In place of s/H» The calculations for Q(K) can be used, therefore % with only a change of '< the^U scale byja factor of . '•' ' '"• ^>-"~" ^"'ft *" •' ' i. ' J}'*' - Ill - . .,"■■» v- ■ Appendix 4 Key Appearance in Simple substitution with Independent Le- If successive letters are chosen independently e the different ' letters have probabilities Pi P2 Ps» we calculate the expected number of different letters when N letters have been intercepted. ; It is,. :,^,L, ,i IW - s - e (l - Pi) N ; t To prove thi*« * iiaklte«iri^'*^Klbl« sequences of N le written down, each wifch'^a frequency corresponding to its ] bility, giving a total ^of aay A sequences*.. Letter 1 does appear in (1 * Pi) N A of thesej letter E does not appear i (1 - P2) N A etc. Therefore/ "the total number of letters r from sequences is AMI" Pi) N Dividing by A gives us by definition the expected number t missing letters from a random sequence, E(l - p«)N, rphe j of different letters expected in a sequence is the total : of letters S minus this, giving the desired result. If all the pj. are equal this reduces to S - S(l ah exponential approach to S« In the general case there i series of exponentials with different time constants, cor: sponding to different p^, which are added to give «L(N). With the frequencies of normal English used for p^ t we' obtain the curve shown in Fig* 25, along with ah e: mental ourve. The small discrepancy can be attributed to influences of nearby letters* ( IaJBnglish- there is less tc -to double letters than there would be if the letters were pendent but" with' the same probabilities. For English the .bility of a doubled diagram is , ^ i*K.'«Mu • . ••' •- • ■ -k. J: .. * h'S , " r^y 'i'^i*^^- *->.. \v. £ P(i* i) " • 0315 . * while if letters were independent it would be v .-. ^ - » -,:■■■:*■;{ p ■ ; ■ - * *. • •> • ' - -• U. E pj * ,0670. .appendix 5 A Theoretical Case Where All Invariant Statistics of E Are Independent of K . By an invariant statistic of e sequence of letters S »',».., m_2 niQ m^ m 2 • m 3 , we will mean r statistic which is averaged along the length of the sequence E» More precisely a statistic of the form:, Lim i — (F(E_b)*-»- ♦+ F(E„i)+r{E) ♦ F(Et) + F (E2J+...+ F(E n ) n -co (2n+l) ( ^ — .... , . ■ ' . 4 * ".' ■ ■ ... . • ■ -Vi?, : ' '■■ .' . , * , ... " ' • ,. . " . - _ •• where F is any function whose argument Is a possible sequence , and E±a is the sequence E shifted N letters to the right -or loft. Such statistics as the relative frequency of a given letter, of, a given n-gram, transition frequencies, and frequencies with whioh letter i is followed by letter i at e distance n are all invariant. • •• • We will describe a system in which every invariant statistic which the cryptanelyst can construct from the (infinte) intercepted E is independent of both K and M, and thus gives no information to him. This effect and still more occurs with the ideal ciphers of course, but here it is obtained independently of the original message statistics and without any matching of the cipher to the language. Let N be a "random" sequence of letters; N * »•» n_ 2 n -i n n^ n 2 u s ... this is supposedly a known sequenoe (to the enemy) and thus a part of the system, not of the key. Apply eny simple cipher to the message and then add N letter by letter to the result {mod B6)« The ♦•sum'* is the enciphered message* 'it is evident that any Invariant statistic oa S will be (with probability 1) -the same.es that for a rendom sequence* Hence it is Independent of both K and M» ; x • We need hardly add that such a system is easily broken ~the enemy merely subtracts N from E and then solves the simple residual cipher* which 'may often be done with invariant statistics, > Appendix 6 Maximum Repetition Rate in Compound Systems for a Given To- We consider briefly the question of how to arran- component periods in a compound Vigene're or Transposition i to obtain the longest period for a given total key size, component periods are Px, P2,/ t *» Sg JLt is clear that they b'e co prime. Otherwise the total key, which is LP if could \ duoed without changing the period, which is the least comm; multiple of the Pi, merely by deleting a factor which appet several o'f. the P^ from all but one/ Also each p must be e of a prime, for if it contains two primes, it can be divide these parts, reducing the key and not affecting the period, the component periods are selections from the series of pri and powers of prime sj . . 4& 2„ 3, 4, 5, 7, 8, 9\ )^:XZ 4 ?m:i7' f , 19, 23,. 25,. 27, the seleotion being pairwise ooprimeV It appears from empirical evidence that the best of component periods, for a given total size S is found by t following process, 1. Determine the largest M such that Ipj<S where the are the primes in increasing order^ This is the maxi m u m number of periods where the periods are c prime, end is the number of periods to be used. 2. Choose from the sequence A, M elements, consecuti except for the fact that no prime is represented than once, the M elements being as great as possi with aum <S# 3. If the aum is <s move as many as possible of the elements in this block up -a notch in the sequence v still satisfying .the conditions .on the sum and co ' ■ mality , ■ : i r •' 4. Repeat 3 to either part of the original block if , , * :." sible •*• "This process eventually ends and apparent gives', the proper decomposition* ■ ; *-' : ~ >! '" : r -?. For example with 8 » 50^ the .sum of the first primes is 41, of the first 7 is 58. Hence 6 peri will be used. We .have • • 11 + 9, + 8+ 7+ £ + 3w43 13 + 11 +9 + 8 + ^7 + 5 * 53 hence we start with the block 11, 9, 8. 7 5 3 to 6 givl * elemants 11 » 9 » 8 ' 7 . can be up a 13+ 11 +9+8+5+3-49 Nj further improvement seems possible, we obtain F- 13X 11 x 9 x 8x 8 x 3 * 154, 440 The products and sums of the first n prime's are given below n 1 £ 3 4 5 ... 6 7 8 pn , 2 3 5 7 11 13 17 19 Sum 2 ■ 5 10 17 28 " , 41 * 58 77 Product 2 6 30 210 2310 30030 510510 9699590' 22309! C. E. SHANNON Att. Figures .1-25. ■ ENEMY CRYPTANALYST E MESSAGE SOURCE ME55A G M ENCIPHERER T. CRYPTOGRAM DECIPHERER MESSAGE V M KEY K KEY SOURCE KEY K FIG. 6 * >- — T" 1 FIG. 8 ME SSAGE RESIDUE CLASSES M M CRYPTOGRAM RESIDUE CLASSES Cj M, C 3 [ M 7 ] c; PURE SYSTEM FIG. 10 CALCULATION OF Q CURVES FIG. 16 N FIG. 19 CG^RD^OL STRONGLY IDEAL Q- \*\ N - NUMBER OF LETTERS IDEAL CHARACTERISTICS FIG. 20 FIG. 2 2 FIG. 23 September 19 , l*4&-ll£S-CX3-yO Introduction . la elasaioel ae&aanios one considers situations where the state of a syatoa is described bj i Mt of numbers, tie coordinated of the phaae space of the system, and the dynamical behavior la controlled by a eat of ordinary differ- antlal equations. Suca a ays tea is entirely determinate; the future ia completely apeolfiad by toe preaent state aad the dynamical equations, alnoe these differential equations have, ia general, a unique eolation peas lag through a gives point. In other branches of physics (host flow, brown! an motion, diffusion etc) there are situations which saa ha called completely statistical* The path of a particle of gas la described only statistically aad no/ determinate or mesa behsrior ocoars. In this case oae studies the flow of probability which ia described by a partial differential equation of the heat flow typo. the present stomoraadnm J I sens sea a partial diff area- tlal equation ia which both effects occur— there is a definite •mean" motion of a system determinate ia character, carrying its rcpresentatlTC point through phase space la the classical manner with a superimposed statistical effect continually per- turbing it from this path. • a - 2a suoa a mm toe futars coordinates of tbs aysteas •uuot bo precisely predicted; oaly « probability distributioa fuaoUoa oaa be deterained for tha future tiae aaose *alae times tli« volww eleaeat dT is tae probability tbet tae ayatea will m la ibt wolaa* eleaent dr around tae poiat la question. For a snort tlaa tne ays tea is substantially deteralnata , tbs dlatribatloa being concentrated around a point whleb morm* ao- aordlau to tae determinate part of tae equation. As tba statis- tical off acta ooaa into play this distribution broadens oat aad la general approaabea a Halting distributioa anion ia indepen- dent of tbe initial atato of tbs systeau Xa eoac rasps ota taa situation ia s ta l ls* to tbet la quantua aeebaalsa, wbere aysteas are dsseribad only by probnbili- tiea (or wore praaisaiy by wm foaatlons whose squared aaplitudas ara probabilities*. Tbara is tais difference howeTcr; ia quantum mechanics area tae initial state aaaaot be preoiaely deseribed due to tbs aaeertaiaty priaeiple. Coajaeate ▼eriablea aaaaot both be measured elaultaaeousiy vita exactness. Za tae aysteas we consider Hera there are asaaaed to be no dlffioulUes of this aeture— all ooor dins tae aaa be aiaaltaaeoualr aad preeiaely measured, tais eorrespoads to tae differ ease la tae fundamental equation from that of qusataa Aeehsaioe~Sebm,edlagoits equation is for the wave fuaotion * , walla tae equation considered bare deals directly «itfc tae probability density, mas the present work: is adapted to "ifolar" statistical situations. Ihln sort of analysis any *>* expected to apply to many pr obi eat where the actual situation Is quits explicated but a partial theoretical aaalysic is possible, this partial an- alysis Is used for the determinate part of tbs c;u»tioa, and the other complex disturbing effects treated statistically, each situstions may occur la economics, sociology, history, eta. as veil as in many engineering and physios J. problems. G. S. Stlbits la a series of meaoraada bas considered a similar problem la aonaeotioa with the stability of a periodically closed servo ays tea. la ale case the phase space of the system oonslsted of a sat of discrete points, and uie fundamental equation is a difference equation, la the case considered here (which was suggested by Stlbits* eora) the variables are continuous and a differential equation is involved. S Xa a Aataraiaate *ja\*m aita aa a dlaaaaloaai paaaa OMi, nacaa aotioa la iMtriM bar diffaroatial asuatioaa, *• aa*a jgi • fYu\ **, .... **) 1 * X # * a <D vbara taa x* ara ©oordLoate* la taa paaaa apaea *ad t ia tin*. If aa a tart wita * probability diatributioa of poiat* ia paaoa apaoa .... **, t) giving taa probability daaalty ia tsa differ aatiai rain** «lta«at about at 1 . .... a* at tiaa t, taia dlatributfcm cfaaa«f>a adta tin*. ■ * lt» utloa la 4»»orll>»a b» tM ftrUH 41ff«r«sU«i •}u»Uoa or ia taaaor aotatioa / Taia ia oTidoat If »• taia* of ? aa a fluid daaaity uaoaa Yaloaity flald ia f 4 . So* auppoaa taat aa t&* raaraaeautiva poiat of too ayataa aovaa about taa pftaao apaaa it ia ooatinaaily aubjaat to aaOl dlatorb&aeaa, walah ar« of a probability ty?a« tlaia taa ayataa taada to folio* taa aoluUoa of (1) but ie aoatiaaally balac dlaturbad by taa probability affeota, walca amy bo taouaat of aa aoaathlag liJca aolaaular aoUiaioaa of taa aurrouadia* ama m % m oa a aorta* partlelo. *o art Ui«rtitt4 la taa lioltla* •*»• abort taa dltturbiat; tffoota are wp rapid tout T*rj aaall. If we eeeuao that taa &ata*aeaee 1* aa»o«taeottt aaa Isotx-oplt, tfela eta bt rtpreeeate* ay as afldltloaal tara la taa equation of tao aeet flow typo K?*r\ Za tao aort gen*?el oaoo ear tela dlreetloa* 007 00 jr of erred, aad oortalo reslona may aave ereattr partarbatloa effaote« taus taere •111 generally b« * esaU ellpasld of probability about oaoa point. aa4 o oorroopoflcioa poeltlve aefiaite ejiadrntio for* defined erer toe paa*e apeee* Tbli form deeerlbee tao Xoeal •tetletleal perturbine effeets, for eeea point, tao equation tata enauaee tao form Talt partial differential eonetioa «©wae tao flo* of probability la tao panee tpeee, Utb oa eaeeable of eyatene dlatribated at t m aoooraUa to F (a l ) tao attribution at a la tar tlao t^ la tao eolation of (1) for Tao equation (1) la llaoar aad of parabulia typo (la t). In taa x* it le elliptleel, aiaea a 1 ^ la fOaltlra definite. m % m Tao total .robubiUtj la tU jftaao 0j*«* *«asia o^staai, for if vt lot / (a 1 * 5^ ♦ *« • « tfco latogral boia* ow o * xffUi*aUy Xar*o oarfaoo, ud ^ t&o volt awaalt Xf a 1 * to aosltivo oafiaito «o4 oota a 1 ** aa* ar« ooatUwotui la tao aaaao aaaoo turn 4iatri»«tioa v approaaM a ual$*o Halt as t HMK ma Halt la alia«r s«o owr*a«*o t tao pNfesalUty JOtaroaUa* to Uf laltf o* a «o*iatt« Uaitiag 4i#- tritouoa r* alta . CM ft* aay %• f*a iiaitiaa alatritottloa am*t aatlofjr tao olU#tioal ofuatloa ottaiaoa ay oottla* || • 0, To nuom tact the aiitrihution epproaohea a Halt let P 1 and ?g ee two different solution* of ID. Titea the dif- ference o, - ? A - P^ al«o satiafia* the equation aad ^ la poaltive la oaa region B and negative la tae raaaladar at tae apace. Consider tae cuani-ity U auat deer ease for where S la tae surface of tae reeioa B aad T la tae outward Telooity of tale ear face. Since Q vanishes an the surface, tae aeooad tern la aero, aad tae first la Toluae iategrale of diYeraaaeea aad traaafora aj tae i usual theorems lato surface integrale V tae aeooad tera age la vanishes alace Q - on S. la tae first term « A la la tae direction of ^ a© at any point we have < Tims a aj initial distribution ? a «4 ?j H dearaaaia«. •BprMMM t*» MM Xiait. i • I I* It «^ is SeuiUiMOOS, *ftt tots ft <U»«aatHuiUy t PwiH b#> o&u lienors, sad tfcs ▼sotor SUE ftl— aa i t— tsassj » Ths saouat of tiiia di««oatiault/ Is £U «& fcy ft 1 * - ?j) • - If* - ?*) » *frtr« tht b***sd «a4 uafcsjrr »d l«n«r* ***** ts> ti»« two tide* of t&« dltesoiiiuUt/. Tims SMMyiftlsai Aft Mm *»a i1£m o# s*sft i 1 nana ** ****** g> gj - Xft tSM sUpisst Oft« &l»«ASiS*%l •*»* wft fcm If wo «tort with ft «opiko* of prooaoilitr ioaaUaoa at oao point, ta« I— tllato aoaowiar aaa bo aaaarlaoa la oittjOo tor a*, aoar talt poUt wa **r ohaaao a 1 * aad f 1 to bo aoaotaat. Do» to tao f 1 tao aolxo otartt «crln« vita a ▼•lojUy/*, 9111141 too pro»«oUltr tors a 1 * •pr«*de it out. If wo oottt wUtt«i fro* af to wo aooo - * ' „ „. - "' aod too •quatioa boaoaoa taio ia tha o^uatioa far aoat flaw la aa aaiootropla Bodlua. Thai ia ftao y* aooraiooto too «»i*o dlffaooa out lata a m wu&m al»%rlb*tlaa *ita qoaArotU form a**| for th« firot afcort iatorroi of tiaa waoro A. « it tao laroroa fora of a 1 * feliaa Toioauy rial* gaj aom^aaaaoaa at*u«ti«ai .wta. Om portioalo? mm of la tor tot 1* ttei la w&iaa is tUo opooo. ?at a oao a&aooslaaal aaaaa opoco,tfeo a$uatlaa U) taaa aaoaaao ta« faxa A coaoxal solution far tola o*§o &«s *soa foa&u It a*? *o dosaria aa *a mxoi>a* It wae laltlol 41*t*iteatl©a i» a s foactioa, aa taa sjrataa (or 0^aeabJL«) ia fcaooo to aaaa a daflalta talus at x at t * 0, say P $ taaa at \± taa diatribe Uoa is aoraal* ?ao saatax aM^a aa^MP ^^^W^ft^^rd IsV^^^aa^aV^^Oj^ ^9 s^-$ jjj^L^WW^ Taus taa attn £ oaroaaas al aa a, taa ium suits aa taa aystoa aaaid follow am taa atatiattaal sff oata aasaat* Hm tarlaaaa a* iaoraaaoa axyaaaatiaUy to a Ualtia* taiaa a/a aita aalf taa tlaa to ay ova taat taia la taa aalatlaa it la oaly aaaosaojy to saastitats la taa oqoatiea (*) , k* t —a* too tiatrisatloa approaafcaa a normal aao saatarad aa ««ro ultn a* « a/a* M • |U - of*) «* » $ (1 • O****) «iu oa oroitrarr iaitioi aiotritaUoo ? a U) too oolottoo ono bo written *• ma mte*r&l ««lo« U&* aotooo of lu^iiUm of keo* flow 9robl«gt»« • / **m * foe eeoe teaerol rooolto aoX4 la toe I aioo ae ioaal I*hi wh*$i it i ltft»»y fere *&d e^ 1a eooetnat* A *OollEO # of probability eroo&eaa iAte o oorool Aletrleotleo* toe ooefte* folio* la* tfit dtlsrslMU trejeetery oad toe qooArotlO for* vfeloh tekeo toe jtliot of the etaoaor4 eOriatloo toMMNOooi eat* oeoeatioUy towt o eef Ulte limit. *ae eveloeties of too e one tea to io obob aero eoopUeat** 1* tale eeoe oeeew, ftoe eeootlooe for too fiaal aietrloaUoe oro *i*eo io too ejeeodis« Xt la t&t oao Alaoaoloaal llaaa* aaoo «• rtwt alta a aoxaal 4lat*l*atioa aaatoroa oa ao*o aita a* • £ , tao distriOuUe* hm am ttftttjr alta t&« Xoxm. Aa io&iTi&ual oyttoa oxoaotaa •totlotioal aoUaa aooot aoro aaa tao oaaaablo of »jst*m* prodoooo aa oaaoaalo of tiao oarloa. Tail mmiU* aaa b« oooa to ao oaultaloat to taoraal aoiao waiea aaa oooa p*»»ed tirou^a a t Utor with troa»f«r aaaxaoterlotla loa&lag to a po»or opaotrua for ta* aoloo To aaow tola, tao aatoaoxrolatioa aa/ oa o*icul*t«a, Urotoaa vaooo vaXuo at t • !• P aato a aoraai distribution oaatataA m * t^ ia Aiotriootioa at t * 4 la aoeraal vita a § • J . aaA tala ia too autoo jxrolatioa. too power apootnta la tao laavia* taraaafon at aula M mil cystic ^^^^^^^^oa -^x .-^n.. 4 ft • JLfftf*} ft ♦ *(*) F) #% OX 9* mix) t 0. la **• »t4»4y «t*t« *UJ f* ♦ *(x) * • twadBi ?, «* x «*» ± • * o *U) 1 fix) p • o I * 1 1*1 A 1» A«t«ralA*4 V *&• •o&AiUaa |p ttmMi it is *•*•*»•*? /tlx) ii fix) »> • f (x) • x< • -4. • IS* »t obt&U ft* **• ma •tatloattry oolutioa •V 1 * - ' . ^ s-*M - .« of «x?oa«aUftl« 6««?«ftftl&£ lot»4 * «. *&6 I* wtwttsl fte satisfy dp • o »• tfc* this v««>1ym •a* *1m» DATA SMOOTHING AND PREDICTION IN FIRE-CONTROL SYSTEMS By R. R. Rlackman, H. W. Rode, and C. E. Shannon ■ THE problem of data smoothing in fire con- distant airplanes. Suppose, for example, that trol arises because observations of target in observing the target's position we make two positions are never completely accurate. If the errors of opposite sign and a second apart, of target is located by radar, for example, we may 25 yards each. Then the apparent motion of expect errors in range running from perhaps the airplane is in error by 50 yards per second. 10 to 50 yards in typical cases. Angular errors Since the time of flight of an antiaircraft shell may vary from perhaps one to several mils, corresponding at representative ranges, to yardage errors about equal to those mentioned for range. Similar figures might be cited for the errors involved in optical tracking by vari- ous devices. Evidently these errors in observa- tion will generate corresponding errors in the final aiming orders delivered by the fire-control system. A data-smoothing device is a means for mini- mizing the consequences of observational er- rors by, in effect, averaging the results of ob- servations taken over a period of time. The simplest example of data smoothing is fur- nished by artillery fire at a fixed land target. Here the principal parameter is the range to the target. While individual determinations of the range may be somewhat in error, a reliable in reaching its target may be as high as 80 seconds or more, such an error might produce a miss of the order of 1 mile. It is clear that in any comparable situation the effect of ob- servational errors in determining the target rate will be much greater than the position er- ror alone would suggest, and the function of the data-smoothing network in averaging the data so that even moderately reliable rates can be obtained as a basis for prediction becomes a critically important one. Aside from magnifying the consequences of small errors in target position, the motion of the target complicates the data-smoothing problem in two other respects. The first is the fact that it gives us only a brief time in which to obtain suitable firing orders. The total en- gagement is likely to last for only a brief time, estimate can ordinarily be obtained by taking and in any case it is necessary to make use of the simple average of a number of such ob servations. This example, however, is scarcely a representative one for problems in data smoothing generally. The errors involved are small and the averaging process is an elemen- tary one. Moreover, the data-smoothing proc- ess is not of very decisive importance in any the data before the target has time to do some- thing different. Thus the averaging process cannot take too long. The second complication results from the fact that the true target posi- tion is an unknown function of time rather than a mere constant. Thus many more possi- bilities are open than would be the case with case, since any errors which may exist in the fixed targets, and the problem of averaging estimated range can normally be wiped out merely by observing the results of a few trial shots. More representative problems in data smoothing arise when we deal with a moving target. In this case errors in observational data may be much more serious, since they determine not only the present position of the target but also the rates used in calculating how much the target will move during the time it takes the projectile to reach it. An illustra- tion is furnished by antiaircraft fire against • Bell Telephone Laboratories. to remove the effects of small errors is cor- respondingly more elusive. The intimate relation between data smooth- ing and target mobility explains why the data- smoothing problem is relatively new in war- fare. The problem emerged as a serious one only recently, with the introduction of new and highly mobile military devices. The airplane is, of course, the archetype of such mobile instru- ments, and we have already mentioned the data-smoothing problem as it appears in anti- aircraft fire. Since the relative velocity of air- plane and ground is the same whether we sta- tion ourselves on one or the other, however, the 71 72 DATA SMOOTHING AND PREDICTION IN FIRE-CONTROL mobility of the airplane produces essentially the same sort of problem in the design of bomb- sights also. Another field exists in plane-to- plane gunnery. Although they are somewhat slower, the mobility of such vehicles as tanks and torpedo boats is still considerable enough to create a serious problem here also. Future examples may be centered largely on robot missiles. It is interesting to notice that a guided missile may present a problem in data smoothing either because it belongs to the enemy, and is therefore something to shoot at, or because it belongs to us, and requires smoothing to correct errors in the data which it uses for guidance. The tendency to higher and higher speeds in all these devices must evidently mean that fire control generally, and data smoothing as one aspect of fire control, must become more and more important, unless war making can be ended. Very mobile instruments of war, such as the airplane, began to make their appearance in World War I, but there was insufficient time during that war to make much progress with the fire-control problems which such instru- mentalities imply. In the interval between World War I and World War II, however, a considerable number of fire-control devices, such as bombsights and antiaircraft compu- ters, were developed. The principal attention in the design of these devices, however, was on the kinematical aspects of the situation. Although a number of them included fairly successful methods of minimizing the effects of observational errors, b it seems fair to say that in the interval between the two wars there was no general appreciation of the existence of the data-smoothing problem as such. It follows that the theory of data smoothing advanced in this monograph is the result prin- cipally of experience gained in World War II. More specifically, it is the product of the ex- * Most of these solutions depended upon the use of special types of tracking systems. Examples are found in the use of regenerative tracking in bombsights and antiaircraft computers or in the determination of rates from a precessing gyroscope or an aided laying mech- anism in an antiaircraft tracking head. So far as their effect on the data-smoothing characteristics of the overall circuit is concerned, these devices are equiva- lent to simple types of smoothing networks inserted directly in the prediction system. This is discussed in more detail under the heading "Exponential Smooth- ing," Section 10.1. perience of the authors with a series of proj- ects, largely sponsored by Division 7 of NDRC, concerned with the design of electrical antiair- craft directors. In addition, it draws largely on the results of a number of other investiga- tions, also NDRC sponsored. The possible key importance of data smoothing in the design of fire-control systems was recognized by Division 7 early in the course of its activities and the emphasis placed upon it in a number cf proj- ects led to the accumulation of a much larger body of results than nJght otherwise have been obtained. Data smoothing is developed here in terms of concepts familiar in communication engi- neering. This is a natural approach since data smoothing is evidently a special case of the transmission, manipulation, and utilization of intelligence. The other principal, and perhaps still more fundamental, approach to data smoothing is to regard it as a problem in sta- tistics. This is the line followed in the classic work 1 by Norbert Wiener/ For reasons which are brought out later, Wiener's theory is not used in the present monograph as a basis for the actual design of data-smoothing networks. Because of its fundamental iaterest, however, a sketch of Wiener's theory is included. The authors' apologies are due for any mutilation to the theory caused by the attempt to simplify it and compress it into a brief space. The present monograph falls roughly into two dissimilar halves. The first half, consist- ing of the first three or four chapters, includes a discussion of the general theoretical founda- tions of the data-smoothing problem, the best established ways of approaching the prob- lem, the assumptions they involve, and the authors' judgment concerning the assumptions which best fit the tactical facts. In this part may also be included the last chapter, which contains a fragmentary discussion of alterna- tive data-smoothing possibilities lying outside the main theoretical framework of the mono- graph. The rest of the monograph is concerned with the technique of designing specific data-smooth- ing structures. A fairly elaborate and detailed treatment is given here, in the belief that the • Wiener is also responsible for providing tools which permit the gap between the statistical and communica- tion point* of view to be bridged. CONFIDENTIAL DATA SMOOTHING AND PREDICTION IN FIRE-CONTROL 73 problem of actually realizing a suitable data- smoothing device is, in some ways at least, as difficult as that of deciding what the general properties of such a device should be. The technique, as given, draws heavily upon the highly developed resources of electric network theory. For this reason the discussion is couched entirely in electrical language, al- though the authors realize, of course, that equivalent nonelectrical solutions may exist. For the benefit of readers who may not be familiar with network theory, the monograph includes an appendix summarizing the prin- ciples most needed in the main text. Two further remarks may be helpful in un- derstanding the monograph. The first concerns the relation between data smoothing and the overall problem of prediction in a fire-control circuit. These two are coupled together in the title of the monograph, and it is clear that the connection between them must be very close, since, as we saw earlier, small irregularities in input data are likely to be serious only as they affect the extrapolation used to determine the future position of a moving target. In the statistical approach, in fact, data smoothing and prediction are treated as a single problem and a single device performs both operations. In the attack which is treated at greatest length in the monograph a certain distinction between data smoothing and prediction can be made. To simplify the exposition as much as possible, the explicit discussion in the mono- graph is directed principally at data smooth- ing. This, however ( is not intended to suggest that there is any real cleavage between the two problems or that the analysis as developed in the monograph does not also bear, by impli- cation, upon the prediction problem. Any the- ory of data smoothing must rest ultimately upon some hypothesis concerning the path of the target, and the exact statement of the as- sumptions to be made is in many ways the most important as well as the most difficult part of the problem. The same assumptions, however, are also involved in the extrapolation to the future position of the target. It is thus impos- sible to solve the data-smoothing problem with- out also implying what the general nature of the prediction process will be. For example, the formulation given in Chapter 9 amounts to the assumption that the target path is specified by a set of geometrical parameters correspond- ing to components of velocity, acceleration, etc. The data^smoothing process centers about the problem of obtaining reliable values for these parameters. To obtain a complete prediction thereafter, it is merely necessary to multiply the parameter values thus obtained by suitable functions of time of flight and add the results to the present position of the target. The other general remark concerns the tacti- cal criteria used in evaluating the performance of a data-smoothing system. This turns out to be one of the most important aspects of the whole field. It is assumed here that the tactical situation is similar to that of antiaircraft fire against high-altitude bombers in World War II. The defense can be regarded as successful if only a fairly small fraction of the targets en- gaged are destroyed. On the other hand, the lethal radius of the antiaircraft shell is so small that it is also quite difficult to score a kill. Under these, circumstances we are interested only in increasing the number of very well aimed shots. When we combine these assumptions with the path assumptions described in Chapter 9 we are led to the data-smoothing solution for- mulated here, in preference to the solution ob- tained with the statistical approach. On the other hand, we might equally well envisage a situation in which the target contained an atomic bomb or some other very destructive agent, so that it becomes very important to intercept it, while the lethal radius of the anti- aircraft missile is correspondingly increased, so that great accuracy is not needed for a kill. In this situation our interest would be focused on the problem of minimizing the probability of making large misses, and the solution fur- nished by the statistical approach would be ap- proximately the best obtainable." 1 " In fairness to the statistical solution it should be pointed out that it is also the beat obtainable, without regard to the lethal radius of the shell, if we replace the path assumptions made in Chapter 9 by a "random phase" assumption. The path assumptions in Chapter 9 are almost at the opposite pole from a random phase assumption, and represent a deliberate overstatement, made in order to illustrate the theoretical situation as clearly as possible. CONFIDENTIAL Chapter 7 GENERAL FORMULATION OF THE DATA-SMOOTHING PROBLEM ONE of the principal difficulties in any treatment of data smoothing is that of stating exactly what the problem is and what criteria should be applied in judging when we have a satisfactory solution. It is consequently necessary to embark upon a rather extensive general discussion of the data-smoothing prob- lem before it is possible to consider specific methods of designing data-smoothing struc- tures. This preliminary survey will occupy Chapters 7, 8, and 9. As a first step this chap- ter will describe two of the general ways in which the data-smoothing problem can be ap- proached mathematically. The formulation of the problem which is finally reached in Chap- ter 9 is not the one which is most obviously suggested by these approaches. This, however, does not lessen their value in characterizing the problem broadly. 7.1 A PHYSICAL ILLUSTRATION In an actual fire-control system the data- smoothing problem is usually made fairly spe- cific because of the particular geometry adopted in the computer. It may be helpful to have some particular case in mind as a touchstone in interpreting the general discus- sion. For this purpose the most appropriate example is furnished by long range land-based antiaircraft fire, since most of the analysis described in this monograph was developed originally for its application to this problem. It is usually assumed in the antiaircraft prob- lem that the target flies in a straight line at constant speed, and in one case at least the computer operates by converting the input data into Cartesian coordinates of target position and differentiating these to find the rates of travel in the several Cartesian directions. These rates form the basis of the extrapolation. The process is illustrated in Figure 1. The input coordinates are transformed into elec- trical voltages proportional to x P , y,., and z r , the Cartesian coordinates of present position, in the coordinate converter at the left of the diagram. The extrapolation for * is shown explicitly. It consists essentially in differen- tiating to find the x component of target velocity, multiplying the derivative by the time of flight t f and adding the result to x P to find 15 (LEV < AZIU a* COMDINA CONVERTI si j 1 COOROI CONVEI FU2E ELCV »ZIU / Figure 1. Dat diction circuit. x F , the predicted future value of x. A similar procedure fixes y r and z r . After the addition of certain ballistic corrections, these three co- ordinates of future position are transformed into gun aiming orders in the coordinate con- verter shown at the right of the drawing. This last unit also provides the time of flight re- quired as a multiplier in the extrapolation. The small irregularities in the input data caused by tracking errors are greatly magni- fied by the process of differentiation. It is thus necessary to smooth the rates considerably if a reliable extrapolation is to be secured. The data-smoothing network for the x coordinate is represented by JV, in Figure 1. Since the Car- tesian velocity components are theoretically constants if the assumption of a straight line course at constant speed is correct, a data- smoothing network in this computer must be essentially an averaging device which gives an appropriately weighted average of the fluc- tuating instantaneous rate values fed to it. The problem of "smoothing a constant" is given special attention in Chapter 10. Aside from the particular circuit of Figure 1, we may, of course, be required to smooth a constant when- ever the prediction is based upon an assumed geometrical course involving one or more pa- rameters which are isolated in the circuit. CONFIDENTIAL 75 76 FORMULATION OF THE DATA-SMOOTHING PROBLEM In addition to smoothing the rates we can, if we like, attempt to smooth the irregularities in present position also. A network to accom- plish this purpose is indicated by the broken line structure N a in Figure 1. Of course, in dealing with the present position we are no longer smoothing a constant, but suitable struc- tures can be obtained by methods described later. However, the effect of tracking errors in the present position circuit is so much less than it is in the rate circuit that N 2 can generally be omitted. Geometrical assumptions of the sort implied in Figure 1 are helpful in visualizing the prob- lem, and they are of course of critical impor- tance in determining what the final data- smoothing device will be. It is important not to make explicit assumptions of this kind too early in the formal analysis, however, since the meaning of such assumptions is one of the aspects of the general problem which must be investigated. For example, it is apparent that no airplane in fact flies exactly a straight line, nor flies a straight line for an indefinite period. In detail, the solution of the data-smoothing problem depends very largely on how we treat these departures from the idealized straight line path. For the present, consequently, it will be assumed that the input data are presented to the data-smoothing and predicting devices in terms of some generalized coordinates, the nature of which we wjll not inquire into too closely. A given coordinate might, for example, be a velocity, a radius of curvature, an angle of dive or climb, or any other quantity which would be directly useful in making a predic- tion, or it might be a simple position coordi- nate such as an azimuth or an altitude. The data-smoothing and predicting opera- tion itself is assumed to be performed by linear invariable devices. Aside from the fact that this assumption is, of course, a tremendously simplifying one, it also fits the data-smoothing problem very nicely, as the problem is formu- lated in this chapter. With other formulations, however, it appears that somewhat better re- sults may be obtainable from variable devices or devices including more or less radical amounts of nonlinearity. These possibilities are discussed briefly in Chapter 14. 72 DATA SMOOTHING AND PREDICTION Figure 1 illustrates a distinction between two possible methods of looking at the data- smoothing problem which it is advisable to establish for future purposes. In describing the x system in Figure 1 we laid emphasis on the particular networks N, and N s . It is clear, how- ever, that the complete x circuit with input x, and output x F is a network having overall transmission properties which can be studied. Since t, will normally vary with time, the net- work is not, strictly speaking, an invariable one, but the changes of t, are ordinarily too slow to make this an essential consideration. When it is necessary to make a distinction between these points of view, a network such as N x , which is merely an element in the pre- diction process, will be called a data-smoothing structure. An overall circuit, providing data smoothing and prediction in one step, will be called a data-smoothing and prediction net- work, or simply a prediction network. Al- though these points of view have been illus- trated for rectangular coordinates, they obvi- ously apply also in many other situations. For example, we might go so far as to apply the overall point of view to a complete circuit from input azimuth, say, to output azimuth. Both points of view are taken from time to time in the monograph. When possible, how- ever, principal attention has been given to the limited data-smoothing problem. This tends to simplify the discussion, since the limited prob- lem is evidently more concrete than the overall prediction problem. Moreover, it permits us to deal lightly with such questions as the particu- lar choice of coordinates in which the smooth- ing operations are conducted, since it assumes that the general kinematical framework of pre- diction has already been decided upon. On the other hand, the overall point of view is more effective in certain situations, and it is the only natural one in the statistical treatment de- scribed in the next section. 73 DATA SMOOTHING AS A PROBLEM IN TIME SERIES The most direct and perhaps the most gen- eral approach to data smoothing consists in re- CONFIDENTIAL THE AUTOCORRELATION 77 garding it as a problem in time series. This is the approach used by Wiener in his well- known work. 1 It essentially classifies data smoothing and prediction as a branch of statis- tics. The input data, in other words, are thought of as constituting a series in time similar to weather records, stock market prices, production statistics, and the like. The well- developed tools of statistics for the interpreta- tion and extrapolation of such series are thus made available for the data-smoothing and prediction problem. To formulate the problem in these terms, let fit) represent the true value of one of the coordinates of the target and let git) repre- sent the observational error. Then fit) and git) are both time series in the sense just defined. The set of all such functions corre- sponding to the various possible target courses and tracking errors form an ensemble of time series or a statistical population. One can im- agine that a large number of particular func- tions fit) and git) have been recorded, each with a frequency proportional to its actual frequency of occurrence. Wiener assumes that they are stationary, that is, that the statistical properties of the ensemble are independent of the origin of time. This, of course, implies that both functions exist from t = — co to i = + co . We will sometimes find it more convenient to make the assumption that the two functions vanish after some fixed, but sufficiently remote, points on the positive and negative real t axis.* The input signal to the computer is of course fit) + git). If we assume that the coordinate in question represents a position, the quantity we wish to obtain is fit + t,), where t, repre- sents the prediction time. If the coordinate is a rate, we are interested in an average value of f(t) over the prediction interval. This com- plicates the mathematics somewhat, but does not essentially affect the situation. » This is done for technical mathematical reasons. We ahall later have occasion to consider the Fourier trans- forms of f(t) and 0(f), and, to have well-defined trans- forms, the integrals of the squares of the two func- tions, from t - - co to t = + <o , should be finite. This would not happen under the "stationary" assumption. Wiener avoids the difficulty by introducing what he calls a generalized harmonic analysis, but this method is far too complicated to be treated in a brief sketch like the present. We shall not, of course, be able to predict fit+t f ) perfectly accurately. Let the pre- dicted value be represented by f*it + t,). In virtue of our assumption that the data- smoothing and prediction circuit is to be a linear invariable network, the relation between f*{t •¥ t,) and the total input signal fit) +git) can be written as /*(< + </) = / \M + gi<r))dK( a ) (1) where dKia) represents the effect of the data- smoothing and prediction circuit. Comparison to equations (2) and (5) of Appendix A shows that K is, in fact, the indicial admittance of this circuit. The particular problem to be solved is of course that of finding a shape for the function Ki<r) which will make + t,) the best possible estimate of fit + * f ). The fact that the upper limit of integration in equation (1) is taken as a = is particu- larly to be noted. It corresponds to the fact that in making a prediction we are entitled to use only the input data which has accumulated up to the prediction instant. This restriction will be conspicuous in the next chapter, where the time-series analysis is completed. 7 * THE AUTOCORRELATION The principal statistical tool used in study- ing equation (1) is the so-called autocorrela- tion. Under the "stationary" assumption the autocorrelation for fit) is defined by * i(t) = g$*hf-T w*«w>*. (2) We can obtain a normalized autocorrelation, which is more convenient for some purposes, by dividing by </>,(<>)• This gives C f(l+r)fit)dt , , \ <t>\ir) .. J-t *" (t) = *m - Ay. ~r • « J T 1/(0 J' dt If we assume that fit) in fact vanishes for sufficiently large positive or negative values of t, the limit sign can be disregarded and e> lAr ( T ) becomes simply CONFIDENTIAL 78 0,v(r) - ffrj fit + T )f(t)dt (4) ( / (ty^dt and represents the total "energy" in the time series f(t). Precisely similar expressions can be set up for the autocorrelation <f> 2 ir) or <j> 2K (r) of the observational error function git). In a gen- eral case we might also have to worry about a possible cross correlation between fit) and g(t). This would be represented by a cross- correlation function <£ 12 (t), obtained by inte- grating the product f(t + r)g(t). In practical fire control, however, it can be assumed that the correlation between target course and tracking errors is small enough to be neglected. As a simple example of the calculation of an autocorrelation we may assume that f(t) = sin wt. Then 1 C T tf>i (t) = lim ;r=, I sin u(t + t) sin wt • dt = lim 2? / ~ [cos wt — cos (2wt + wr)]d - \ cos «*, (5) since the term cos (2a>t + an-) will contribute nothing in the limit. The maximum value of (r) in (5) is found at t = 0. This is to be expected, since ob- viously the correlation between identical val- ues of the function is the best possible. What is exceptional about the present result is the fact that <£,(t) is not small for all large t's. This is fundamentally a consequence of the fact that we chose an analytic expression for fit), so that the relation between two values of the function is completely determinate, no matter how great the difference between their arguments. In a more representative time series, involving a certain amount of statisti- cal uncertainty, we would expect £,(r) to ap- proach zero as t increases, reflecting the in- creasing importance of statistical dispersion as the time interval becomes greater. The significance of the autocorrelation func- tion for data smoothing and prediction is ob- vious without much study. Thus, suppose for simplicity that the observational error #(0 is zero. Then the autocorrelation <f>, (t) is the only one involved. It is a measure of the ex- tent to which the true target path "hangs to- gether" and is thus predictable. For example, in weather forecasting it is a well-known prin- ciple that in the absence of any other infor- mation it is a reasonably good bet that tomor- row's weather will be like today's but that the reliability of such a prediction diminishes rap- idly if we attempt to go beyond two or three days. This would correspond to an autocorrela- tion function which is fairly large in the neigh- borhood of t = 0, but diminishes rapidly to zero thereafter. In a similar way the autocorrelation of the observational error git) represents the extent to which this error hangs together. In this case, however, a high correlation is exactly what we do not want. Thus, if <£ 2 (t) vanishes rapidly as r increases from zero, closely neigh- boring values of g are quite uncorrelated, and we need only average the input data over a short interval in the immediate past in order to have most of the observational errors aver- aged out. If 4> 2 ir) is substantial for a much longer range, on the other hand, a much longer averaging period is necessary, with corre- spondingly greater uncertainties in the value obtained for fit). «■ THE LEAST SQUARES ASSUMPTION The autocorrelation function does not in it- self suffice, to determine a time series com- pletely. For example, it is easily seen that the functions sin t + sin 2t and sin t + cos 2t have the same autocorrelation in spite of the fact that they represent waves of quite different shape. The autocorrelation function, however, has a peculiar importance in the fact that under many circumstances it is the only piece of information about the time series which we need to know. The significance of the autocorrelation be- comes apparent as soon as we investigate the error in prediction. In many mathematical sit- uations involving linear systems it is conven- ient to deal with the square of the error rather than with the error itself, since a first varia- tion in the error squared expression gives a CONFIDENTIAL ^DATA SMOOTHING AS A _F_1LTER PROBLEM__ linear relationship in the quantities of direct interest. We will deal with the square of the error here. If E represents the instantaneous error, /* (t + t,) - fit + t,) , the mean square error over a long period of time is evidently lim L f* = iim — r \r(t + t,) -f( t + t,)}*dt [f(t + t f )]*dt - lim ^ f f( t + t,)f*(t + t/)dt T -»» TJ_ T + lim JL I'* ir(t + t,)\ 2 dt. (6) The first integral in equation (6) can be evaluated immediately. From (2) it is <M0). To evaluate the second integral replace f*(t + t f ) by its definition from (1). This gives -lim lf T f{t + t,)dt ["[fit - r) + g(t - T )]dK(r) = - lim ]- f dK{r) ( T lf(t + t / )f(t-r)+f{t + t / )g(t-r)}dt J-T if we reverse the order of integration. Since we assume that / and g are uncorrelated, how- ever, the product f (t + t f )g\t - r) in this ex- pression makes no contribution to the final re- sult, and by replacing the integral of f(t + t,) f(t — t) by its value in terms of 4> l the expres- sion as a whole can be written as ■ -if <t>i(tf +t) dK( T ). The third integral in (6) can be simplified in similar fashion. The final result becomes & - 4>i (P) - 2 f *i Jo (tf + r) dK(r) (7) +J\k{t) £ [0i(r - c) + Mr ~ <r))dK(c) . The only quantities appearing in equation (7) are the autocorrelations, <£, and 4> 2 , of the true target path and the observational error, and the function K which specifies the data- unoothing structure. The theoretical problem with which we are confronted is evidently that of choosing K to make the mean square error as small as possible for any given $'s. This problem will not be attacked here, although a solution obtained by a somewhat indirect method is presented in the next chapter. The principal reason for deriving equation (7) is to demonstrate the very important fact that the mean square error depends only upon the two autocorrelations. No other characteristics of the input data need be considered. It will be recalled that the mean square cri- terion was introduced originally on the ground of mathematical convenience. This leaves un- settled the question of how good a measure of performance for a data-smoothi; g network it actually is. This is a critical question, since upon it depends the validity of the whole ap- proach outlined in this chapter. A priori, the least squares criterion is a dubious one since it gives principal weight to large errors. In fire control we are normally interested only in shots which are close enough to register as hits. If a shot misses it makes little difference whether the miss is large or small. The merits of the least squares criterion are considered in more detail in Chapter 9, where the conclu- sion is reached that the criterion is probably adequate for many problems but needs to be supplemented or replaced in others, including the special case of heavy antiaircraft fire to which particular attention is given in this monograph. Pending the discussion in Chapter 9, the least squares criterion will be assumed to be a valid one, with the understanding that the analysis is intended primarily for its value in contributing to the general understanding of the data-smoothing problem rather than as a means of fixing the exact proportions of an op- timal smoothing network. DATA SMOOTHING AS A FILTER PROBLEM The time-series approach to data smoothing is closely associated with another which at first sight may seem quite different. This second approach is suggested by the procedures used in communication engineering. Here the sig- nals, be they voice, music, television, or what not, are again time series. Instead of dealing CONFIDENTIAL 80 with actual signals varying in a more or less irregular and random manner with time, how- ever, it is customary to deal with their equiva- lent steady-state components on the frequency spectrum. 6 The analysis of data smoothing can conven- iently be approached by supposing that both the true path of the target and the effects of tracking errors are represented, in a similar way, by their frequency spectra. When the situation is presented in this way, however, there is an obvious analogy between the prob- lem of smoothing the data to eliminate or re- duce the effect of tracking errors and the prob- lem of separating a signal from interfering noise in communication systems. We may take as an example of the latter the transmission of voice or music by ordinary radio over fairly long distances, so that the effects of static in- terference are appreciable. In such a system a reasonable separation of the desired signal from the static can be obtained by means of a filter. In a representative situation an ap- propriate filter might transmit frequencies up to perhaps 2,000 or 3,000 cycles per second,' while rejecting higher frequencies. The choice of any specific cutoff, such as 2,000 or 3,000 c, in the radio system depends upon a compromise between conflicting consid- erations. Both speech or music and static nor- mally include components of all frequencies which can be heard by the human ear. Thus, suppressing any frequency range below the limits of audibility, at perhaps 10,000 or 20,000 c, will injure the signal to some extent. The intensity of the signal components, however, diminishes rapidly above 2,000 or 3,000 c, while the energy of the static interference is more evenly distributed over the spectrum. Thus, by filtering out the first 2,000 or 3,000 c, we can retain most of the signal while rejecting most of the noise. Naturally, the exact dividing line will depend upon the relative levels of signal and noise power. If the static interference is quite weak, for example, it would be worth b The review of communication theory given in Ap- pendix A shows how this equivalence is established by Fourier or Laplace transform methods. In practice, of course, the filtering would probably take place in the radio-frequency circuits, but it is more convenient here to think of it occurring in the demodulated output. while to transmit a considerably wider band in order to retain a more nearly perfect signal. If the static level is extremely high, on the other hand, it would be necessary to transmit a still narrower band at the cost of greater mu- tilation of the signal. The separation of the true path of a target from the observed path including tracking errors, as a preliminary to prediction of the future position of the target, presents an ap- proximately analogous situation. Again the spectrum of the "signal" or true path is con- centrated principally in a low-frequency band, in most instances, while the energy of tracking errors or "noise" appears principally at con- siderably higher frequencies. Thus the two can be separated by a low-pass filter. The separa- tion, however, is not complete since some com- ponents of the signal spectrum extend into the noise region. Thus the smoothing process must be accompanied by some mutilation of the sig- nal, and the optimum compromise is again attained from a filter which transmits a rela- tively broad band when the tracking errors are of low intensity and a much narrower band when they are large. In these terms the most obvious difference between the data-smoothing problem and the static interference problem in the radio system is in the order of magnitude of the frequencies involved. They are roughly 10,000 times smaller in the data-smoothing case. Thus, the typical signal band in a fire-control system may cover a few tenths of a cycle per second, in compari- son with a useful band of 2,000 or 3,000 c in a radio system, and the spectrum of tracking errors or noise, with representative tracking devices, includes appreciable components up to perhaps 2 or 3 c, in comparison with a total effective noise band in the radio system ex- tending to the limits of audibility at perhaps 20,000 c. This analogy between data smoothing and the filtering problems which appear in ordi- nary communication systems transmitting speech or music must of course not be carried too far. For example, previous experience with communication filters is of no help in fixing in detail the cutoff in attenuation characteristic of the data-smoothing filter, since in communi- cation systems these choices depend on psycho- CONFIDENTIAL PHYSICAL AND TACTICAL CONSIDERATIONS 81 logical considerations of no relevance in the fire- control problem. Methods of determining the best rules for proportioning a data-smoothing filter, therefore, remain to be determined. We may also notice that, whereas the time-series approach was of the data-smoothing and pre- diction type, the filter approach emphasizes data smoothing only. The addition of the pre- diction function can be expected to change ma- terially the overall characteristics of the cir- cuit. Neither of these remarks, however, robs the filter approach of its value as a simple way of thinking about the problem qualitatively. RELATION BETWEEN TIME-SERIES AND FILTER APPROACHES 7.7 The time-series and filter methods of looking at data smoothing are related to one another by the fact that the autocorrelation can be com- puted from the amplitude spectrum, or vice versa, by Fourier transform means. Consider, for example, the Fourier transform of the autocorrelation. If we make use in particular of (4) we have 0..v (r)e ~* V2irJ_ a i- f ""dr jC f(t + r)f(l)dt 1 V2t w t X V2 /.CO f{t)dt / f(l +t) e-^-dr •J — CD /(/ + T)e-*"»+*J rfr (8) where 1 f m *'(«) = me-»*dt y/2 L. f '2r X. f(t + t) e - •«('+') dr (9) F(w) is of course the steady-state spectrum of the signal f(t). Equation (8) thus states that the Fourier transform of <f>. s - is equal to a constant times the square of the amplitude of the steady-state spectrum. The amplitude squared spectrum is, however, a measure of the power per cycle. The relation is therefore equivalent to the statement that the autocorre- lation and power spectrum are Fourier trans- forms of each other. Since we have already established the fact that the mean square error in prediction de- pends only on the autocorrelation, this analysis enables us to conclude immediately that the mean square error can also be calculated from the power spectra of the signal and noise. It is entirely independent of the phase relations in either signal or noise. The phase character- istics of the data-smoothing network, which operates on the signal after a specific wave shape has been established, is, of course, still of consequence. PHYSICAL AND TACTICAL CONSIDERATIONS Thus far the material which has been pre- sented has been primarily mathematical. It has consisted, in other words, of outlines of general analytical methods which are available for use with the data-smoothing problem. It is also possible to approach the problem in a much more concrete fashion. It is obvious that by giving thought to the details of the physical characteristics of tracking units and targets, and to the tactical situations with which we expect to deal, it should be possible to draw a number of specific conclusions about the prob- lem as a whole. In a general theory of the de- sign and tactical use of fire-control apparatus such an approach might well be a primary one. It is scarcely possible to follow it in detail in the present discussion. The following para- graphs, however, indicate some of the kinds of considerations which can be brought into the problem in this way. It will be seen that they tend to modify the strictly mathematical ap- proach, partly by qualifying to some extent the assumptions made in the mathematics, and partly by tending to give much more emphasis to particular aspects of the problem than would appear in a general analytic outline. Choice of ouukuiinatbb One of the most obvious omissions in the general analysis thus far is any consideration of the choice of coordinates in which the data CONFIDENTIAL 82 FORMULATION OF THE DATA-SMOOTHING PROBLEM smoothing is to take place. So far as either the statistical or filter theory is concerned, the coordinates in the data smoother may repre- sent either the original tracking data or any transformation of them. The fact that there is actually something to be decided here, however, is easily seen from the long-range antiaircraft problem. The input tracking coordinates for antiaircraft would normally be azimuth, eleva- tion, and slant range. If the airplane flies in a straight line roughly overhead, the general shape of the azimuth and the azimuth rate as functions of time are given by the curves in Figure 2. The curves become indefinitely 3200 2400 1600 800 ACMILS) A(MIL5/SEC) V tSECS 600 400 200 10 Figure 2. Azimuth and azimuth rate for crossing target. steeper as the target path approaches the zenith, and it will be seen that if the approach is reasonably close, either the azimuth or the azimuth rate must include a very substantial amount of high-frequency energy. Since the possibility of an effective separation between the signal and noise in the filter approach de- pends upon the assumption that the signal com- ponents are of quite low frequency with respect to the noise, the presence of this high-frequency energy is evidently serious. When the target describes a violently evasive path the signal spectrum must naturally in- clude substantial high-frequency components, whatever the coordinate system may be. The high-frequency components indicated in Figure 2, however, are due to the fact that the target path happens to pass almost over the director and are essentially superimposed upon the high-frequency components which reflect the complexity of the target path itself. It is clear as a matter of principle that an acceptable coordinate system for data smoothing should not introduce frequency components which de- pend upon such accidental factors as the loca- tion and orientation of the coordinate system. The rectangular system mentioned in connec- tion with Figure 1 evidently meets this condi- tion; so also does the "intrinsic" system de- scribed in the next section. Physical Limitations of Target or Tracker We may also approach the data-smoothing question by a consideration of the motions which are physically possible either in the target or in the tracking device. In the heavy antiaircraft problem, for example, there are substantial physical limitations on the per- formance possibilities of present-day aircraft We can be quite sure that any motion incom- patible with these limitations is necessarily a tracking error and can be removed from the incoming data. Naturally, these limitations must appear in the power spectrum of the sig- nal if they affect the mean square error in pre- diction, so that their existence in no way dis- putes the mathematical framework we have set up. Consideration of the physical factors which produce them, however, may permit them to be established more easily or in more clear-cut fashion than would be possible from a statistical examination of target records alone. The limitations on airplane performance can be stated most simply when the motion of the airplane is expressed in so-called intrinsic coordinates. These are the speed of the air- plane, its heading, and its angle of dive or climb. The maneuvering possibilities of a con- ventional airplane in these three directions are quite unequal. By banking sharply it can maneuver violently to the right and left and thus make quick changes in heading. The pos- sibilities of maneuvering up and down, how- ever, are considerably less, particularly for a heavy airplane, where there are usually restric- tions on the maximum angle of dive or climb which can be assumed. The possibilities of quickly changing the speed of the airplane, finally, are almost nil. The thrust of an air- plane propeller is so small in comparison with CONFIDENTIAL 83 the mass of the airplane that only small accel- erations are possible.* 1 Thus the optimum filters for the three coor- dinates should be different. The one for speed can have a very narrow band, since most of the signal energy for this coordinate occurs at very low frequencies. The optimum band for the angle of dive or climb, however, should be larger (unless it turns out that pilots seldom make use of maneuvering possibilities in this direction) and the one for the heading larger still. In this ability to discriminate among the various possible directions of motion the in- trinsic coordinate system is evidently an im- provement even on the rectangular system. Settling Time Another aspect of the data-smoothing prob- lem which has not been given conspicuous at- tention in the purely mathematical discussion is the fact that in an actual tactical situation questions of elapsed time are of great impor- tance^ Engagements usually begin suddenly and last for a comparatively brief period, and it is important to find a data-smoothing scheme which provides adequate firing data as quickly as possible after an engagement starts. A situ- ation essentially similar to the beginning of an engagement may also be presented whenever the target makes a sudden change of course or whenever it is necessary to shift from one target to another in a given attacking body. The time required for a computer to give usable output data after any of these events is its so-called "settling time," and is one of the most important parameters of any data- smoothing system. It is possible to make rough estimates of settling time by indirect means in both the statistical and filter theories of data smoothing, but no explicit consideration of necessary time lapses appears in either theory. Evidently, the fundamental fault lies with the "stationary" assumption. * This ignores the possibility of changing the speed through gravitational forces. Since these possibilities are linked to the angle of dive or climb, however, they can be predicted. This has actually been done in one experimental computer. Effect of Human Factors Aside from the conditions on target perform- ance which arise from the physical character- istics of the target itself, there are others which are due to the fact that the target is under the control of a human being with a definite purpose. The language of the statistical and filter methods is broad enough to cover almost any situation. It tends to suggest, how- ever, that the typical target paths with which we deal are the relatively structureless conse- quences of random physical forces. The inter- vention of purposive human behavior, on the other hand, tends to give paths which fall into more or less definite patterns. A simple illus- tration is furnished by the argument which is frequently offered in defense of the straight line assumption in dealing with antiaircraft defense against heavy bombers. It is contended that while the targets may in fact engage in substantial evasive maneuvers during most of their flight, there will always be a substantial period during the bombing run in which they must fly very straight in order to achieve bombing accuracy. On the basis of ordinary probability we would of course expect substan- tial straight line segments quite infrequently if the course as a whole shows marked disper- sion, and the intervention of the human pilot thus provides a higher degree of structure than one would expect in a corresponding situation dominated by purely natural factors. A broader example is furnished by a com- parison of two airplanes, or perhaps more simply of two boats, one of which is under the control of a human operator, while in the other the steering controls are lashed in a neutral position. Both boats, say, may be expected to experience small variations of course due to the random effects of wind and waves upon them. Over a short period of time the observed mo- tions of the two boats should be substantially identical. In the case of the boat with the lashed helm these random variations will tend to accumulate, so that it is possible to make a reasonable prediction of the position of the boat for only a comparatively short distance in the future. In the boat with the human steersman, on the other hand, we may expect corrections to be applied as soon as the random effects become large, so that the boat tends to CONFIDENTIAL 84 FORMULATION OF THE DATA-SMOOTHING PROBLEM retain the same general course and it is pos- sible to predict its position hours or even days later from a relatively brief observation. Neither of these illustrations is inconsistent with the mathematical framework laid down phase relations, even if the special features in these situations may be the controlling factors in determining the actual probability of hit- ting. If we could believe the bombing run hypothesis, for example, and had a sufficiently earlier in the chapter, in a purely theoretical accurate computer and gun, we could expect sense. For example, the bombing run illustra- tion merely states that because of the presence of the human operator there are definite phase relations in the input signal. As we have seen, such relations can exist without affecting com- putations based on mean square error. The to score a hit in every engagement, no matter how large the mean square error might be. More generally, it is probably only the ten- dency of targets to exhibit "line spectra" which prevents the real probability of a kill, small at best, from becoming microscopic. It is nec- comparison between the piloted and pilotless essary to lay special emphasis on these factors boats can be interpreted as the result primarily of differences in the signal power spectrum. In the case of the pilotless boat, for example, the signal occupies a fairly continuous low- frequency band, while in the case of the piloted boat it must be regarded as concentrated very closely around zero frequency, so that it is ap- proximately a line spectrum superimposed on a continuous one. The formal mathematical theory covers also such cases as these. The point of this discussion, however, is that the mathematical theory, although it is suf- ficiently general in a formal sense, fails to dif- ferentiate between such situations as those just described and the more shapeless sort which the mean square error is not a good involving continuous spectra with random guide to the actual probability of scoring a hit. in order to keep the overall fire control picture in perspective. CRITERION OF PERFORMANCE Last on this list of doubts about the statisti- cal and filter theories, we may mention the least squares criterion of accuracy. This was discussed before, but it is mentioned again as a matter of emphasis, and because of its close relation with the factors we have just dis- cussed. For example, the bombing run illustra- tion obviously represents one situation in CONFIDENTIAL Chapter 8 STEADY-STATE ANALYS Tt was shown in the previous chapter that J- both the statistical and filter theory ways of looking at the data-smoothing problem lead naturally to an analysis in terms of the power spectra of the signal and noise. The phase rela- tions are not important as long as we accept the mean square error as a criterion of per- formance. The inadequacies of the mean square criterion will finally force us to abandon the steady-state attack in favor of a direct analysis in terms of the wave shapes of some assumed signals. The steady-state attack is nevertheless a very useful one. This chapter will conse- quently continue the analysis from this point of view. It will be assumed as heretofore that the heavy antiaircraft problem is the particular subject of interest. A large part of the discussion hinges upon the conditions which must be satisfied by the external characteristics of an electrical net- work if it is to be capable of physical realiza- tion in any way whatever. These limitations and the characteristics which may be postulated for physical networks are decisive since, in the absence of such restrictions, no limits could be set upon the performance which might be ex- pected from data-smoothing and predicting circuits. The facts about physically realizable networks which we shall find of most use are summarized below, but the reader not familiar with this field is urged to read also the account given in Sections A.9 and A.10, Appendix A.»* The conditions which must be satisfied by physically realizable networks can be stated in either transient or steady-state terms. In tran- sient terms they are expressed most simply by the statement that the response of a physical network to an impulsive force must be zero up to the time the force is applied. Thus the net- work has no power to predict a purely arbi- trary event. That is, it has no way of foresee- ing whether or not an impulse is actually going to be applied to it. This characteristic of physi- cal networks is taken as a postulate. The steady-state limitations on physical net- S OF DATA SMOOTHING works are expressed in terms of their attenua- tion and phase characteristics. They may be derived either from the transient specification or from the postulate that a physical network must be stable. There are no important limita- tions to be placed upon the attenuation and phase characteristics of physical networks as long as we deal with these characteristics "sepa- rately, but there are very severe limitations on the phase characteristic which can be associated with any given attenuation characteristic or vice versa. In particular, when the attenuation characteristic is prescribed, there is a definite formula for calculating the unique limiting phase characteristic with which it may be asso- ciated. 1 " This is the so-called "minimum phase" characteristic because any other physical net- work having the postulated attenuation char- acteristic must have as great or greater phase shift at every frequency. As we shall see later, this greater phase characteristic would corre- spond to longer lags in obtaining usable data, so that the minimum phase characteristic is the optimum for a data-smoothing network. The minimum phase characteristic has the addi- tional important property that not only does it specify the transfer admittance of a physical network, but the reciprocal of that transfer admittance can also be realized by a physical structure.' In addition to this principal formula for the relation between attenuation and phase there are a number of subsidiary expressions for special aspects of the problem. One in partic- ular, relating the attenuation to the behavior of the phase characteristic in the neighborhood of zero frequency, is used extensively in this chapter. » In limiting cases, such as may be found when the transfer admittance contains zeros or poles exactly on the real frequency axis, the "physical structure" may require such constituents as ideally nondissipative re- actances, perfect amplifiers with unlimited gain, etc. This, however, is of no consequence for the present general discussion. CONFIDENTIAL 85 86 STEADY-STATE ANA! DATA SMOOTHING " 1 THE SIGNAL SPECTRUM It is natural to begin with a discussion of the spectrum of a typical target path. Unfortu- nately no data on the spectra of actual meas- ured airplane paths exist, and the theoretical assumptions which may be made about paths of airplane targets are best discussed in the next chapter. This section consequently will be confined to rather general observations about the problem. It will be convenient to assume for definiteness that the quantities to be smoothed are the velocity components in Car- tesian coordinates. The simplest point of departure is furnished by the conventional assumption that the target flies in a straight line at constant speed. If we could construe this assumption literally, it would mean that the velocity spectrum in rec- tangular coordinates would reduce to a single line at zero frequency. In practice, of course, the spectrum is not so simple. Even in the absence of deliberate maneuvering, the target will fly a slightly curved path because of "wander." Moreover, even if the target could fly exactly straight, the single line spectrum would apply only to a straight course in- definitely continued. The spectrum becomes more complicated if we consider the fact that tracking must have begun at some finite time in the past, or that the target may presumably change occasionally from one straight line course to another. As a result of both these causes, the actual signal spectrum must be regarded as occupying a band bordering on zero frequency. The distri- bution of energy in detail will, of course, depend on particular circumstances. The band has no very well defined upper limit, but in most cases the great bulk, at least, of the energy should be below, say, one-fourth or one- fifth of a cycle per second. For example, the natural periods of a heavy airplane, which one would expect to be correlated with wander, are below this limit." This limit is also sufficient to include most of the energy resulting from changes in course occurring as frequently as every ten or twenty seconds. In general, it is to be supposed that the sig- nal spectrum varies as where n may be 1, 2, 3, depending on the frequency range. This follows from general considerations of the limitations of airplane performance. Thus, if we suppose that the velocity changes discon- tinuous^ from time to time, it follows from general Fourier principles that the amplitude must vary as This is presumably a fair representation of the actual signal spectrum at low frequencies. At moderate frequencies, how- ever, we must take account of the fact that the velocity can actually be changed rapidly but not discontinuously, and we consequently assume that the amplitude begins to vary as ur a . Finally, at frequencies of the order of per- haps one cycle per second one must take ac- count of the fact that the airplane must bank in order to turn. Since it takes some time to roll into the bank, even the acceleration in the lat- eral direction cannot be discontinuous, and consequently the amplitude must begin to vary as c.r\ The application of such successive limit- ing factors in constructing a complete spec- trum is described in more detail in Section A.8 of Appendix A. One other general condition of the same kind can be mentioned. It can be shown" that the integral from zero to infinity of log H/l + if", where H is the power spectrum, is very impor- tant in determining the properties of a time series. More explicitly, the integral converges if the series is essentially statistical, so that we cannot foretell the future from the past with absolute certainty. This of course is the case with an actual signal spectrum in a fire-control problem. It implies two consequences; first, that H cannot be zero over any finite band ; and second, that in the neighborhood of infinite fre- quency H diminishes slowly enough so that | log H\/o>->0. •« THE NOISE SPECTRUM The spectrum of tracking errors depends largely upon the particular sort of tracking equipment involved. Broadly speaking, optical tracking equipment (at least that of the present or recent past) tends to produce tracking errors not only of small amplitude, but also of low frequency, so that they are hard to separate from the signal spectrum. Radar equipment, of the present time, produces higher-frequency errors. Relatively high-frequency errors are particularly likely to be found in very stiff automatic tracking radars. CONFIDENTIAL RANDOM NOISE FUNCTIONS _ 87 A number of examples of spectra of tracking errors are shown in Figures 1, 2, and 3. The spectra are given directly in terms of range and angle errors. To make them comparable with the velocity spectra described previously POWER SPECTRUM RANGE ERRORS RMS =30 YDS MEDIAN = 0.022CPS 6.10*- 5.10* a. E 4.10*- t 4 6 « 10 FREQUENCY IN UNITS OF Figure 1. , 12 14 IS 90 Power spectrum of range errors of ex- r. it would be necessary to multiply all amplitudes by io. In addition, it would of course also be necessary to multiply the angle rates by some suitable range in order to compare them di- rectly with the yards-per-second rates we have otherwise considered. After multiplication by <■>, the radar spectra appear to be about flat up to perhaps one cycle. Beyond that point they no doubt drop off slowly, although the accuracy of the data is not sufficient to permit the situation to be stated very exactly. 8.3 RANDOM NOISE FUNCTIONS The properties of the signal and noise as we assume them here can be conveniently expressed by reference to the theory of so-called "random noise" functions. h A random noise can be defined as a function which has a definite amplitude spectrum but completely random phase characteristics. The theory of such func- tions is well developed because of their frequent POWER SPECTRUM ANGULAR HEIGHT ERRORS RMS= 1.0 MIL MEDIAN =0.53 CPS t 10 A 6 8 10 12 FREQUENCY IN UNITS OF^CPS Figure 2. Power spectrum errors of experimental radar. of angular height occurrence in physics. It is probable that neither our noise functions nor our signal func- tions are, strictly speaking, random noise ac- cording to this definition. Thus, there are proba- bly certain definite phase relations in our noise functions because of the physical character- istics of tracking devices. There is no evidence, however, that any such relations are important enough to be significant in the data-smoothing problem, so that we are fully justified in iden- tifying them with random noise functions as defined above. The phase relations in the signal are by no means random. As long as we con- sider only the mean square error, however, this factor is immaterial, and we can replace the actual signal by a random noise function with the same power spectrum for purposes of analysis. The most familiar example of a random noise function is furnished by the thermal "The fact that we also refer to tracking errors as "noise" is, of course, merely a coincidence. CONFIDENTIAL 88 voltage across a resistance R. This is a random noise whose spectrum is constant up to very high frequencies with the value P == 4\kTR (k is Boltzmann's constant and T the absolute temperature) . A second example is black body POWER SPECTRUM TRAVERSE ERRORS RMS = 1.4 MIL MEDIAN =0.31 CPS CO 10 i EL U cr UJ 1 CO — J 2 ■» - •OWER II / / ^ 2 4 6 1 1 10 12 14 16 FREQUENCY IN UNITS OF jtCPS 45 Power spectrum of trav radiation. If there is black body radiation in a space, the electric (or magnetic) field intensity at a point is a random noise function with spectrum P(D = 8*/ 3 1 according to Planck's law. Random noise func- tions also occur in the Schottky effect, in Brownian motion, and in diffusion and heat flow problems. For purposes of analysis, a random noise function can be thought of as a function made up of a large number of sinusoidal components, which are very closely spaced in frequency and whose phases are completely ran- dom. 21 231 Thus a random noise can be repre- sented as .V 2] a- cos {u n t + <(>n) n - 1 where w n — n&f, A/ being the frequency differ- ence between adjacent components. The phase angles <f>„ are random variables which are in- dependent with a uniform probability distribu- tion from to 2tt. As A/ decreases the functions in this ensemble approach, in a certain sense, a limiting ensemble, providing the amplitudes a„ are adjusted properly. What is desired is to have the total power in the neighborhood of each frequency approach a certain limit P(/), the power spectrum at that frequency. To do this we make a.i = 2tP(/)A/. In the limiting ensemble the total power within a small frequency range A/ is then P(/)A/. The function PU) completely describes the random noise ensemble from the statistical point of view. A particularly important special case is that of a random noise with a constant power spec- trum. This is often called "flat" or "white" noise. True constancy out to infinite frequencies is of course impossible since it would imply an infinite total power in the function. The idea is, however, still useful and can be approxi- mated, as with resistance noise, by having a spectrum which is constant out to such high frequencies that behavior beyond this point is of no importance to the problem. We may con- veniently think of flat random noise as being made up of a succession of weak impulses oc- curring frequently but at random times with respect to one another. This results from the fact that a Fourier analysis of a single impulse gives a flat spectrum, and the random occur- rence of many of them produces a random set of phases. In a physical problem, such as resis- tance noise or Brownian motion, these im- pulses might correspond to the effects of indi- vidual small particles. Such a situation is of course completely chaotic. If the impulses are large and occur relatively infrequently, the power spectrum is still flat, though the func- tion is no longer a random noise function as defined here. This conception, which corre- sponds to a physical situation including definite causative elements, will be revived later under the name of the elementary pulse method of analysis. Random noise functions have a number of interesting characteristics. For example, they have the "ergodic property." This means that CONFIDENTIAL 89 averaging a statistic along the length of a par- ticular random function give' the same results as averaging the same statistic over an ensemble of functions having the t ime power spectrum. Each function is typical of the ensemble. To be more precise one must admit exceptions, but the probability of an exception is zero. For example, if we determine the frac- tion of time a given random function f(t) has a value greater than some constant .4, it will be equal to the fraction of all functions in the ensemble which are greater than A at t — (with probability 1 ) . A second characteristic of random noise functions is the fact that they frequently lead to Gaussian or normal law distributions. For example, the aronlit'-Hes of a random noise function are di^tri^ <:._d about zero in accord- ance with the nvr^ttal error law. Likewise, the amplitudes for two points spaced a given dis- tance apart form a two-dimensional normal error law distribution when we consider all possible positions of the first point. It is ap- parent that if the signal and noise are actually random functions the mean square error is as good a criterion of performance as any other, since it completely fixes the distribution in a normal law case. A final property of random noise functions is the fact that if a random noise is passed through a filter the output is still a random noise. If the power spectrum of the noise is P(w) and the transfer characteristic of the filter is Y(iw), the output spectrum is P(a>)\Y(iw) \\ In particular, if we take the derivative of a random noise with spectrum P(w) we obtain one with spectrum w 2 P(w). This last property of random noise functions suggests a method of representing them which we shall find useful in the future. The method is represented by Figure 4. It consists of a FLAT SHAPING NOISE SOURCE FILTER Figure 4. Circuit representation of random functions. source of flat noise followed by a shaping filter to give the desired power spectrum. We can easily assign to the filter the characteristics of a physically realizable structure by making use of the relations between attenuation and phase mentioned earlier in the chapter. It is merely necessary to convert the desired power spec- trum into a specification of the attenuation characteristic of the filter and then use the loss-phase formula to compute the correspond- ing phase shift. It will be assumed that this procedure has been followed when we make use of this circuit at a later point. The method of representing random func- tions thown by Figure 4 illustrates graphically the basis of the prediction schemes described thus far. The flat noise is of course absolutely unpredictable. The history of the function up to any given instant gives no indication of its value even a microsecond later. The filter, how- ever, forces the output current to have a cer- tain structure on which a prediction may be based. For example, if the filter will pass only very low frequencies it is clear that the output can change very little in a microsecond. 84 THEORETICAL PROPORTIONS FOR A DATA-SMOOTHING FILTER The signal and noise spectra furnish the raw material from which a suitable data-smoothing filter can be deduced. We have still to deter- mine, however, the exact rule for choosing the cutoff and attenuation characteristic of the filter from these spectra. It is clear that previ- ous experience with signal-to-noise problems in systems transmitting voice- or music is no help, since the filter proportions here depend upon psychological considerations of no rele- vance to the fire-control problem. For example, the interfering effect of a small amount of noise is much greater than one might expect from energy considerations, especially in in- tervals of low message level, and it is con- sequently worth while to maintain a relatively high level of attenuation in the noise band. Conversely, the breadth of the band required for the message depends as much on the ability of the ear to reconstruct a complete signal from an incomplete one as it does upon the actual signal power spectrum. In the data-smoothing case a suitable crite- rion, dependent upon more physical considera- tions, can be obtained by minimizing the rms error at the filter output. This criterion is CONFIDENTIAL 90 STEADY-STATE ANALYSIS OF DATA SMOOTHING easily developed from the power spectrum ap- proach, and in a sense it is, of course, the only possible one as long as we follow the methods developed thus far. A very general theory for the minimization of the rms error of the filter output has been developed by Wiener. 1 Since the power spec- trum approach is not the one we shall eventu- ally follow, however, it is not necessary to give this analysis in detail. The nature of the rela- tionships can be seen from an elementary corn- in Figure 5 let OA be a unit square error is a minimum if 0' Figure 5. Vector relation between input and out- put of data-smoothing network. vector representing the signal component at some particular frequency. Let the amplitude ratio between the input and output of the data- smoothing filter be x, and let it be assumed that the system is phase distortionless. This can always be accomplished, at the cost of lag, by phase equalization. Then the actual signal output can be . represented by OB, where OB/OA = x. Let the ratio of noise power to signal power at this frequency be k 2 . Then the output noise can be represented by the vector BC, at some arbitrary phase angle 6, where BC/OA = kx. The error in the output of the data-smooth- ing filter is evidently represented by the vector AC. We have (Acy = (CM) ? i(i - x - kxcosey + (kxsmey] = {OA)* l (1 - i s ) - 2*i(l - x) cos 6 + k'x') . Since 6 is random the cross-product term in- volving cos 6 disappears on the average. (More generally, it disappears as long as the noise and signal are uncorrelated, whether or not their relative phases are entirely random.) This leaves the mean square error as Wan - (OA)l [1 _ 2Z + (1 + *»)*»] . (1) x — 1 1 + A-» P N + P s where P B and P s are, respectively, the signal and noise power at this frequency. Upon sub- stituting this result in equation (1) and "re- membering that (OAV = P B , we find that the minimum mean square error is PsPs (2) min Ps + Pi Equation (2) evidently represents the sought- for rule for the filter transmission character- istic. It is illustrated in Figure 6, where P N © ce z 21. to 2 to o 1 1 1 1 1 1 w 1 I 1 ^ I ■ I 1 1 i — - FREQUENCY 02 Figure 6. Optimum transmission characteristic for data smoothing assuming signals with random noise characteristics. Figure 7. Si in Figure 6. spectra assumed and P t have been chosen respectively as the flat curve and the 1/w* curve in Figure 7. In comparison with the characteristics of typi- cal filters in communication systems it is quite CONFIDENTIAL 91 rounded with a relatively slowly falling ampli- tude characteristic. More important than the detailed rule for the transmission character- istic, however, is the conclusion that the shape of the characteristic is not very critical. There is very little loss in replacing the actual curve in Figure 6, by any other similar character- istic. For example, we might validate the assumption of zero phase distortion by making use of the curve which automatically gives a linear phase shift. 150 A more extreme illustration is furnished by the infinitely selective filter characteristic, with perfect transmission in the range in which the signal power is greater than the noise power, and zero transmission elsewhere, indicated by the broken lines in Figure 6. It follows from equation (1) that in the neighborhood of the cutoff point <o the mean square error for this filter is twice that of the optimum structure. In most frequency ranges, however, the penalty is far less than this. Since even a two-to-one change in the mean square error would produce no tremendous improve- ment in the effectiveness of fire, it is clear that the result to which we are led by this method of attack is by no means critical. LAGS IN DATA-SMOOTHING FILTERS The analysis just concluded has been directed at the amplitude characteristics of a data- smoothing filter. By virtue of the relations be- tween the amplitude and phase characteristics of physical networks mentioned earlier in the chapter, however, the analysis permits us to »l p ■u / IN »• 1 u a V f •- < 3 ■■ < Figure 8. Some filter attenuation characteristics. give at least a partial description also of the phase characteristics of the filters. This is an important consideration because it bears upon the question of time delays in data-smoothing systems which was mentioned in Chapter 7. The general nature of the relationship in simple cases is illustrated by Figures 8 and 9. to 10 30 01 «l M U ■ 9 * y /j — — e SHirr in 1 y y M £ / uA*< 1 Figure 9. Corresponding minimum phase char- acteristics. Figure 8 shows a series of rising attenuation characteristics equivalent to rather unselective falling amplitude characteristics of the general type shown by the principal curve in Figure 6. Figure 9 shows the corresponding phase char- acteristics computed on a minimum phase shift basis. In Figure 8 the central attenuation char- acteristic B has been so chosen that the corre- sponding phase characteristic in Figure 9 is exactly a straight line at low frequencies, where the transmitted amplitudes are appreci- able. Curves A and C in the two drawings show slightly different cases, but it is clear from the figures that the tendency of the phase characteristics to approximate linearity is still marked. In communication engineering a phase char- acteristic proportional to frequency is inter- preted as indicating a delay in seconds equal to the slope dB/dw of the phase characteristic. This relation is illustrated most simply by an ideal line. The ideal line has zero attenuation combined with a phase shift which is propor- tional to frequency and which at any given fre- quency is also proportional to the length of the line in question. If we apply any arbitrary wave to the line it is propagated down the line with a definite velocity and unchanged wave form. The time required for the wave to reach CONFIDENTIAL 92 any point on the line is equal to the slope of the phase characteristic to that point. In a structure like a filter, which has an at- tenuation characteristic varying with fre- quency, it is of course no longer possible to transmit an arbitrarily impressed wave with- out change in wave shape. Even if the applied wave is merely a suddenly applied d-c voltage or single frequency sinusoid, there is a tran- sient period before the response approximates its final value. In structures having a substan- tially linear phase characteristic over any fre- quency range in which they exhibit an appreci- able amplitude response, however, this total transient characteristic falls naturally into two parts. The first is a waiting period equal to the slope of the phase characteristic, during which the response is very small, whereas the second is a true transient period in which the response is substantial but does not resemble the final steady-state response. This is illustrated by Figure 10 which shows the voltage at the fifth L5 LO 05 t 10 15 20 <J e t 25 Figure 10. Voltage at fifth section of conventional low-pass filter in response to unit d-c voltage. section of a conventional low-pass filter in response to a d-c voltage applied at zero time at the input terminals. 1 " The end of the waiting period, as deduced from the slope of the phase characteristic, is indicated by the broken line. Delays of the sort just illustrated must be expected in a data-smoothing filter whenever the nature of the signal is changed. This hap- pens at the beginning of tracking, in changing from one target to another, or even in follow- ing a single target when the target makes an abrupt change in course. Since usable data in a fire-control system must be quite accurate, the delay to be allowed for must include both the initial waiting period and the subsequent transient period until the transient ripples have almost vanished. A considerable part of the art of desi ung data-smoothing networks consists in controlling the design so that these final transient ripples decay relatively rapidly. We are not yet ready to discuss this problem: It will turn out, however, that the minimum interval which can be assigned to the "true transient" period is about equal to that which must be allowed for the initial waiting period/ Thus the slope of th? phase characteristic can be used as an index of the lags which must be expected in data smoothing merely by doubling the delay to which the slope would normally be said to correspond. When we use the phase slope as an index of delay it becomes immediately apparent that lags are the necessary consequence of smooth- ing in physical circuits. This is easily seen by- reference to the relations which must exist be- tween attenuation and phase characteristics in physical structures. An example is provided by the formula 15 * 1 (3) where A is attenuation, .4,, is the attenuation at zero frequency, and B is phase shift. In other words, the delay (measured by the slope of the phase characteristic at zero frequency) is pro- portional to the integral of the attenuation on an inverse frequency scale when the attenua- tion at zero frequency is taken a&.the reference. The equation thus states that the system will exhibit a lagging response as long as there is a net high-frequency attenuation. As a numerical illustration, let it be supposed that A is zero below 4» — 1. This corresponds to the estimate made earlier in the chapter that the input sig- nal components in antiaircraft work lie roughly in the band below about 0.1 or 0.2 cycle per sec- ond. Let it be supposed also that A at higher frequencies is equal to 3 nepers, corresponding to an average amplitude reduction of about 20 c This is not intended to imply that the distinction between the initial waiting period and the "true tran- sient" period is quite as sharp as it is in Figure 10. The selectivity in a data-smoothing filter is usually not great enough to justify the assumption that components beyond the linear phase region are of negligible im- portance. CONFIDENTIAL 93 to 1. Then dB/d* at the origin is given from equation (3) as S/n seconds, and in accordance with the rule just enunciated the minimum de- lay to be expected from such a structure in a data-smoothing application would consequently be 12/ir seconds. Aside from such specific quantitative rela- tions equation (3) is useful as a basis for a number of important qualitative conclusions. One, for example, is the fact that although a lag is a necessary concomitant of any system showing a high-frequency attenuation, the amount of the lag depends greatly upon the portion of the frequency spectrum in which the attenuation is found. Since the integral is taken on an inverse frequency scale, a small attenuation at low frequencies is much more important than a considerably greater attenua- tion further out in the spectrum. This points to the desirability of designing tracking instru- ments which generate principally high-fre- quency noise, even if the amplitude of the noise is somewhat increased thereby. We may also notice that since the attenuation is a logarith- mic function of amplitude an initial moderate reduction in the amplitude of disturbing noise may be much less expensive in lag than subse- quent attempts at further reduction. For ex- ample, an amplitude reduction from 100 to 10 per cent over a given portion of the frequency spectrum produces no more lag than a subse- quent reduction from 10 to 1 per cent. »« WIENER'S PREDICTION THEORY- ZERO NOISE CASE In Chapter 7 we distinguished between what we called the simple data-smoothing problem and the data-smoothing and prediction prob- lem. The simple problem, with which this re- port is chiefly concerned, is the one which has been given principal attention thus far. On account of its broad interest, however, it seems worth while to include also a brief statement of Wiener's solution of the general problem. The method of development used here is intui- tive and nonrigorous in comparison with Wiener's own development, but it permits the principal relations to be established by very elementary means. It is convenient to consider first the zero noise case. The past history of the signal, then, is known perfectly, and the existence of a prediction problem depends entirely upon the fact that since the signal is assumed to be sta- tistical in character, its future is not com- pletely determined from its past. The situation can be thought of in the terms suggested by Figure 11. The actual signal output appears at FLAT NOISE SOURCE SHAPING NETWORK N, PREDICTING NETWORK N. r l NETWORK Figure 11. Schematic representation of Wiener's prediction theory when there is no noise. P,. In accordance with the discussion earlier in the chapter, we imagine this signal to be generated by passing flat noise through the shaping network N x . The transfer admittance Y x (iio) of N t is determined from the power spectrum of the signal by the procedure out- lined earlier and is a minimum phase shift char- acteristic. It will be recalled that minimum phase shift transfer admittances have the im- portant property that their reciprocals are also the transfer admittances of physically realiz- able networks. From F, we can readily compute the tran- sient response characteristic of N\. We shall assume for illustrative purposes that the im- pulsive admittance of N l takes the special shape shown by Figure 12. Figure 12. Assumed impulsive admittance of shaping filter. The flat noise is thought of as consisting of a large number of elementary impulses with random amplitudes and occurring at random times. For the purposes of this analysis, how- ever, it is sufficient to consider only the three unit impulses shown in Figure 13. Impulse B is supposed to occur at the instant at which 94 STEADY-STATE ANALYSIS OF DATA SMOOTHING the prediction is to be made, A occurs two sec- onds in the past, and C, one second in the future. The response of AT, to these three im- pulses will evidently be three curves of the sort given by Figure 12, suitably displaced in time as shown by Figure 14. B 1 -2 -I Figure 13. Impulses giving rise to applied signal through shaping filter. The desired output of the predicting network is the curve of Figure 14 advanced by the pre- diction time, which we can assume, for illus- tration, to be two seconds. It may be assumed SUM \ I t # 1 , a • I » " $ $ 9 1 "Hf \r /\ '* / V * \ \ % \ t \ % \ * \ t $ $ 1 . * I * / t < V -< 2 4 t 8 Figure 14. Applied signal at P„ for the sake of preliminary analysis that the input of the predicting network is the three original impulses of Figure 13. The terminal P t at which they are supi appear is of course a purely fictitious one and is not acces- sible to us physically. We can, however, con- struct the equivalent terminal P' 3 by imposing the actual signal from terminal P x on the net- work N 2 , whose transfer admittance is the reciprocal of that of Let the predicting network connected to ter- minal Fa be represented by N,. Obviously a perfect prediction would be secured if N t could be assigned the impulsive admittance shown in Figure 15, that is, an impulsive / / 2 ( » ; > A 6 « Figure 15. Iueal impulsive a tion network N, in Figure 11. equal to the impulsive admittance of the origi- nal network but moved forward by the 2-second prediction time. Then all the constituent curves and the sum curve in Figure 14 would similarly be moved forward. Of course we cannot assign AT S an impulsive admittance which is different from zero at negative times without postulat- ing a nonphysical network. It is, however, per- fectly possible to define N, from the portion of the impulsive admittance characteristic at posi- tive times, with the remainder set equal to zero. This gives an impulsive admittance of the type shown by Figure 16. When energized by the three unitary impulses, it gives the result shown in Figure 17. The contributions of impulses A and B are not affected by the absence of a negative time portion of the im- pulsive admittance, but the contribution of im- pulse C is lost. To formulate a physical prediction network 2 < \ A Figure 16. Realizable portion of required im- pulsive admittance. CONFIDENTIAL WIENER'S THEORY— CENERAL CASE 95 we have merely to find by conventional meth- ods the steady-state admittance Y, corre- sponding to the impulsive admittance of Figure 16. The two networks A T , and A 7 ;1 may then be in the manner shown by Figure 18. The first source of flat noise, together with the shaping network N,„ is the combination we have already used to represent the signal in the noise-free -2 2 4 6 8 Figure 17. Response of realizable prediction net- work. combined to give a single structure with the transfer admittance Y,Y : = YJY, which will give the complete prediction when energized by the actual signal. The mean square error in prediction is easily determined from the fact that the con- tributions of all impulses of the sort repre- sented by C, occurring in the prediction in- terval, are lost. Since impulses in the flat noise source occur at random times the mean square error is proportional to jT W-( T )d T , where a is the prediction time and W is the impulsive admittance of Figure 16. Since the flat noise impulses occurring after the time at which the prediction is made are surely unpredictable, it is clear that this error is the least we could expect any physical prediction network to have WIENER'S THEORY-GENERAL CASE When the input data includes noise as well as the signal it is natural to think of the situation FLAT NOISE SOURCE SHAPING NETWORK N* FLAT NOISE SOURCE SHAPING NETWORK Figure 18. Circuit representation of random func- tions representing signal and noise. case. The addition of noise is represented by the second independent source of flat noise with its associated shaping network N h . They com- bine to give the total input measured at P t . This diagram emphasizes the fact that we think of the noise and signal as originating from different physical sources. By postulate, however, we are not able to separate the sources experimentally. So far as any observed result is concerned, consequently, we may as well deal with the simplified structure shown in Figure 19 which contains a single source of f LAT SOUR" SHAPING NETWORK IS NETWORK «4 — * NETWORK PREDICTING NETWORK "t Figure 19. Schematic representation of Wiener's prediction theory when there is noise. flat noise and a single shaping network. The transfer admittance of the shaping network N, is determined by adding the power spectra of signal and noise, converting the result to an amplitude characteristic, and computing the corresponding minimum phase according to ^methods already used for the noise-free Although we cannot separate the signal from d Note that the Bhaping network thu* obtained ia not the same as the one we would secure by adding the transfer admittances of N. and N, in Figure 18 di- rectly. In order to realize the same total power at P, in each case, it is necessary to begin by adding the powers rather than the amplitude characteristics asso- ciated with the two paths. CONFIDENTIAL 96 STEADY-STATE ANALYSIS OF DATA SMOOTHING the noise completely, we saw earlier that the mean square difference between the total input and the signal is minimized if we multiply the amplitude of the input at each frequency by the ratio of the signal power to the sum of the signal and noise powers. A fictitious filter having the prescribed amplitude characteristic is represented by N t in Figure 19. We assigned 2V 4 a zero phase characteristic so that there may be no lag in producing the result at P,. Thus the output at P s at any instant represents the best conceivable estimate (in the least squares sense) of the signal at that instant. The assumption of zero phase, of course, makes N i nonphysical, since it must have at least the minimum phase characteristic associated with its prescribed amplitude characteristic. This, however, is not an objection here since the structure is introduced purely for purposes of analysis. The situation is now reduced to a form in which it is substantially equivalent to the one appearing in the zero-noise case. Wi assume a series of random impulses at P., which would produce responses at P,. The problem is that of advancing the response to each impulse so that the same result appears u seconds earlier at terminal P 4 . The solution is represented by networks 2V, and N 3 , which discharge functions similar to those of the correspondingly labeled networks in Figure 11. Thus, the network N 2 is the reciprocal of N, and is provided to make terminal P' 2 equivalent to P„ as a source of im- pulses. Network N 3 is defined by an impulsive admittance obtained from the impulsive admit- tance between P, and P, by advancing the latter characteristic a units in time and then discarding the portion at negative time. In this procedure there is only one point at which the situation differs from that without noise. In the noise-free case, the original im- pulsive admittance which we wished to advance in time was identically zero at negative times. In order to secure a physically realizable re- sult, we needed only to discard the portion of the impulsive admittance between t = and ( = a. In the present situation, on the other hand, the impulsive admittance is taken from a path in- cluding the nonphysical network N t . Thus the admittance may be expected to take such form as that shown in Figure 20, with nonzero am- plitudes at both negative and positive times, and in order to secure a physical final network it is necessary to discard everything to the left of the line a. Figure 20. Typical impulsive admittance of best smoothing network Ni in Figure 19. This difference in the impulsive admittance characteristics has two consequences. The first is the fact that since the uncertainty of the prediction is measured by the amount of im- pulsive admittance which must be discarded, it is evidently greater in the present case where we are discarding much more. The second is the fact that in the noise-free case uncertainty exists only for a positive prediction time. A negative prediction time, which corresponds, of course, to the determination of the value as- sumed by the signal at some time in the past, can be set into the analysis as easily as a posi- tive prediction time, merely by shifting the im- pulsive admittance to the right rather than the left. In the noise-free case, however, there is nothing to be discarded when we shift to the right, since the impulsive admittance with which we begin is in any case identically zero for negative times. Thus the uncertainty in the determination of any past value of the sig- nal is zero. Since we have postulated no noise to confuse the data, this is, of course, an inevitable result. As soon as noise is included, on the other hand, there is no such sharp dis- tinction between the future and the past. e The uncertainty in the determination of the true value of the signal in the near past is almost as great as it is in estimating what the signal will be in the near future. As we go further * This statement is to be understood in a physical rather than a mathematical sense. It is not intended to imply that there may not be sharp changes of be- havior in the impulsive admittance at zero. CONFIDENTIAL OVERALL CHARACTERISTICS OF PREDICTING NETWORKS 97 and further into the past the uncertainty gradually diminishes. If we can allow ourselves unlimited lag, we at length reach a point at which the discarded portion of the impulsive admittance characteristic is negligibly small. This, however, does not mean that all uncer- tainties have disappeared, but merely that we can base our estimate of the signal upon the power-ratio rule developed previously. 88 OVERALL CHARACTERISTICS OF PREDICTING NETWORKS It has been fairly easy to develop a qualita tive picture of the general characteristics of typical data-smoothing networks. As we have seen, they have amplitude characteristics of the low-pass filter type combined with lagging phase shifts. No corresponding qualitative pic- ture of the characteristics of a typical overall predicting circuit has, however, been developed as yet. The discussion just concluded provides a rule for determining the characteristics of a predicting circuit in any given case, but pro- vides comparatively little in the nature of a description of the result we may expect to secure. In any particular situation we can, of course, calculate the overall characteristics of the pre- dicting circuit. A simpler way of character- izing the overall predictor characteristic quali- tatively, however, is based upon the use of the attenuation-phase relations for physical net- works. We need merely use such an equation as (3) backward. Thus, we have previously shown that a positive phase slope corresponds to a lagging output. Correspondingly, a nega- tive phase slope can be interpreted to repre- sent a lead, or in other words, a prediction.' If we assign (dB/di>) u = in equation (3) a negative value, we see that A-A must on the average be negative. In other words, the am- plitude characteristic of an overall prediction circuit must rise, on the average, as we proceed upward from zero frequency. This is in marked contrast to a data-smoothing network, which, as we have seen, tends to have a low-pass filter type of characteristic with a falling amplitude characteristic at high frequencies. The in- creased amplitude of response may have two detrimental effects. In the first place, it evi- dently produces a- distorting effect on any sig- nal components to which it applies. In the second place, it produces an exaggerated re- sponse to noise. Examples of the characteristics of overall prediction circuits are readily constructed by reference to the circuit of Figure 21. Various Figure 21. One-dimensional prediction circuit with data-smoothing networks. ' This, of course, does not mean that a network with a negative phase slope can predict a perfectly arbitrary event. We can hope to realize a negative phase slope, in combination with a flat amplitude characteristic, over only a finite band. The spectrum of an arbitrary event, that is, any suddenly applied signal, will always include important components running out to infinite frequency, where the negative phase slope can no longer be realized. The statement does, however, mean that if we suddenly apply a signal made up of one or more low-frequency sinusoids, and wait for the steady state to become established, the output will appear to lead the input by a time equal to the slope of the negative phase characteristic. particular results are obtained by assigning particular characteristics to the data-smooth- ing network. Thus, if the data-smoothing net- work is absent entirely the transmission through the path containing the differentiator is u,t lt since differentiation is equivalent to multiplication by i*>. The attenuation of the overall circuit is consequently A = — log |1 + imt f \. This is plotted as curve I of Figure 22. The increasing amplitude characteristic at high frequencies is obviously due fundamen- tally to the increased transmission through the differentiator circuit. If the data-smoothing network is assigned the characteristic (1 + to**)- 1 , corresponding to a very simple low-pass filter type of response, the overall transmission becomes that shown by curve II in Figure 22. (It is assumed that a = t,, for simplicity.) The negative attenuation at high frequencies is much reduced. This is paid for by an increased amplitude of response at low frequencies, but since the integration in (3) takes place on an inverse frequency scale, the low-frequency fragment is much less than the gain reduction at high frequencies. Curve CONFIDENTIAL 98 STEADY-STATE ANALYSIS OF DATA SMO OTHING Ill shows the result whan the data-smoothing network is assigned the characteristic (1 + um) *. Finally, curve IV shows the result obtainable when there is also a After in the 1 4 1 * t s LOSS -4 -» Figure 22. Attenuation characteristics of predic- tion circuit shown in Figure 21. present-position circuit (as shown by the broken lines in Figure 21), so that there may be a net positive attenuation at high fre- quencies. In view of the inverse frequency scale in (3), the gross negative attenuation will be mini- mized if the negative attenuation region is placed very close to zero frequency. This, how- ever, means that much of the signal energy falls in the negative attenuation region so that in certain respects, at least, the signal response must be seriously injured. For example, in the specific circuits just discussed we can place the negative attenuation region at very low fre- quencies by choosing very long time constants, a, in the data-smoothing networks, with the consequence that the circuits will operate cor- rectly for any long continued straight line path, but will be very sluggish in changing from one straight line to another. If the negative attenu- ation region is placed at higher frequencies, on the other hand, the signal response is improved but beyond certain limits the circuit becomes unbearably sensitive to noise. Quantitative illustrations of these relation- ships are quickly constructed. Suppose, for ex- ample, that the prediction time is 2 seconds. From (3) this is consistent with an attenua- tion characteristic having zero attenuation below - = 1 and a net gain of *■ nepers there- after. In other words, the amplitudes of all frequencies below « = 1 are increased by a fac- tor of about 22 to 1. If the region of added gain is pushed to a higher frequency or con- centrated within a narrow band, the multi- plying factor rapidly becomes larger. For ex- ample, if we maintain A at approximately zero below m = 2, the average gain above this point must be 2» nepers, corresponding to a multi- plying factor of 600 to 1. We secure the same factor by attempting to concentrate the region of negative attenuation in the band between m = 1 and m = 2. The multiplying factor also goes up rapidly as we increase the prediction time. For example, with the gain uniformly spread over the frequency region above «> = 1 the multiplying factor is 500 for a prediction time of 4 seconds, or more than 10,000 for a prediction time of 6 seconds. Reasonable multiplying factors with long prediction times can be obtained only by carry- ing the negative attenuation region to very low frequencies. As indicated previously, the cost of this is an increase in the time required for the signal to change from one constant or nearly constant value to another. For exam- ple, in the first illustration above, if the region of nepers net gain is carried down from o> = 1 to n = 0.2 the integral in (3) is just five times as great as it was before, so that the characteristic corresponds to a prediction time of 10 rather than 2 seconds. This change would correspond to an increase* from perhaps 4 or 5 to perhaps 20 or 25 seconds in the time required for the circuit to settle from one con- stant value to another. Practical examples of the transmission char- acteristics of overall prediction circuits, with particular emphasis on the dominant effect of even very small negative attenuations at ex- tremely low frequencies, are shown later in Figures 5 to 8, inclusive. In the linear predic- tor, A - A„ varies as — ku> 2 nears zero, and it is easily seen that such a term makes a finite con- « Only rough numbers can be given, since circuits with the square-cornered attenuation characteristics chosen for illustrative purposes would have very ripply transient characteristics, corresponding to no very well marked settling time. CONFIDENTIAL OVERALL CHARACTERISTICS OF PREDICTING NETWORKS 99 tribution to the integral in (3) . On the other hand, the attenuation of the quadratic predic- tor, which is capable of dealing exactly with polynomial functions of time of the second degree or less, is necessarily zero at the origin" . v2*£ f JS£ of Quasi-Distortionleas Prediction Networks in Appendix A. to terms of the order of « 4 , so that the integral in this region can be neglected. This slight difference between the two characteristics at frequencies of the order of 0.01 cycle per second and below is sufficient to balance the obviously greater negative attenuation of the quadratic predictor at higher frequencies. CONFIDENTIAL Chapter 9 THE ASSUMPTION OF ANALYTIC ARCS THE discussion in the previous two chap- ters has been based upon the assumption that the least squares criterion forms a suita- ble measure of performance for a predicting network. This assumption permitted us to re- strict our attention to the amplitude spectra of the signal and .noise, leaving phase relations entirely out of account. Thus, both signal and noise could be thought of as "random noise" functions characterized by random phases and Gaussian distributions, as described in the preceding chapter. So far as the noise is con- cerned, there seems to be nothing wrong with this assumption. In the case of the signal, how- ever, it appears that significant phase relations may exist. This chapter will consequently set up an alternative analysis which permits the significance of possible phase relations in the target paths to be estimated. The alternative analysis is based upon the assumption that the target courses are sequen- ces of analytic segments of different lengths joined together. These segments are simple predictable curves such as straight lines, pa- rabolas, and circles. Significant phase relations are implied by the assumption that there are sudden changes from one type of course to another. This picture of target paths is, of course, extreme. There are no such sharp discontinui- ties between one segment and another, nor do airplanes fly perfectly along simple curves even for limited periods. Nevertheless, it is the conception of target courses upon which the rest of our analysis is based. The reasons for believing that it is a closer approximation to actual target courses than, say, a random noise function with the same power spectrum would be, are given later. Perhaps more im- portant is the fact that the possibility of hit- ting an airplane flying along such a simple analytic arc is much greater than it would be if we were attempting to predict a correspond- ing random noise function. It is thus advan- tageous to take the analytic arc assumption as a basis for designing the prediction circuit, even if the assumption seems to be reasonably well justified over only occasional segments of actual target paths. An example of such a situation is furnished by the bombing run illustration described in Chapter 7. As a corallary to the analytic arc assump- tion it is also assumed that the theoretical predicted point must be quite close to the actual target position if the probability of scoring a hit is to be appreciable. In other words, such dispersive factors as random errors in com- puter or gun or the lethal radius of the shell, which would tend to produce occasional hits at long distances from the theoretical predicted point, are quite small. This is such a plausible assumption in the light of present-day antiair- craft experience that its critical importance in the present argument is likely to go unper- ceived. However, this is the assumption which limits consideration to small errors in predic- tion, whereas the least squares criterion natu- rally gives greatest emphasis to large errors. If, for example, antiaircraft projectiles were suddenly endowed with a much greater de- structive radius, we would be much more in- terested in fairly large misses, and the objec- tions to the least squares criterion would disap- pear. These postulates are discussed in more detail in the following sections. In anticipation of this discussion the following conclusions may be mentioned: 1. With the assumptions as stated, the pre- diction should be on a modal rather than a least squares basis. In other words, the gun should be aimed at the most probable future position of the target. 2. Modal prediction requires evaluation of the parameters of the analytic arc the target is at present traversing. This can be accom- plished by smoothing the values of these pa- rameters evaluated for a period in the past. 3. If the smoothing is performed by linear invariable networks, the impulsive admittances of these networks should have a definite cutoff after a finite smoothing time. By this means 100 CONFIDENTIAL 101 all data over a certain age are given zero weight. The method of calculating the proper smooth- ing time is developed. 4. Definite advantages can be obtained from circuits with variable smoothing times if such systems can be satisfactorily mechanized. THE TARGET COURSES The target courses, like the tracking errors, can be thought of as a statistically generated set of functions — that is, a stochastic process. The structure of this process is, however, very different from that of the tracking errors. It is by no. means satisfactory to assume the target courses to be equivalent to a random noise having the same power spectrum as the target courses. As we pointed out in Chapter 7, the target is piloted by a purposeful human being. It tends to follow a definite simple curve for a period of time and then to shift to a new simple curve. Much of the flight is in attempted straight lines with constant velocity. Most of the remainder can be considered to be segments of circles or helices in space, or as segments of parabolas or higher degree curves. Straight line constant speed flight corresponds to the airplane controls in a neutral position. The helical flight is a natural generalization allow- ing arbitrary, but fixed, positions of the con- trols. The curves which are parabolic functions of time correspond to constant acceleration in the three space coordinates. Thus, all these assumptions have a reasonable physical back- ground. Most antiaircraft computers are constructed on the assumption of straight line flight, al- though some work has been done in World War II on curved flight directors both with the helical and the parabolic assumptions. There is not a great deal of difference in these two generalizations from the practical point of view, since determination of acceleration terms is subject to such large errors in any case. The important part of this representation of the target courses is that they consist of segments of simple analytic curves joined to- gether. The individual segments are completely predictable if we have a part of the segment given exactly. One need merely evaluate the parameters of the segment from the given part and evaluate the curve for t - t f . The unpre- dictable part of the target courses is due to the possibility of sudden changes from one segment to another. With random noise functions the unpredictableness occurs continuously. This simplified description of the target courses as piecewise analytic functions must be recognized as only a first approximation. A more complete description of the target course would include the "fine structure," the con- necting curves between the various analytic segments and the deviations from the segments due to random air disturbances and similar causes. This latter effect, the wandering of the target from its intended path, might be reason- ably well represented by the addition of a random noise function to the piecewise analytic functions described above. M THE POISSON DISTRIBUTION OF SEGMENT END POINTS The analytic segments of which the course is supposed to consist are not all of the same duration — we may assume some probability distribution of the duration of these segments. The simplest assumption here is that the breaks occur in a Poisson distribution in time. This assumption is not necessary for our analysis but is a reasonable one and leads to a simple mathematical treatment. Any other reasonable distribution would give comparable results. A series of events is said to occur in a Poisson distribution in time if the periods be- tween successive events are independent in the probability sense and are controlled by a distri- bution function p(l)dl = - e-"« dl . a Here p(l)dl is the probability of an interval of length between I and I + dl. This means that the frequency of intervals of a given length is a decreasing exponential function of the length. This type of distribution is familiar in physics as describing the decay of radioactive sub- stances. The time a in the distribution function is the average length of the intervals, since a> CONFIDENTIAL 102 THE ASSUMPTION OF ANALYTIC ARCS - e - ' /a dl 'o ° = a . It is related to the "half life" 6 of the interval by b = a In 2 . The single number a completely specifies the Poisson distribution. The events may be said to be happening as randomly as possible apart from the fact that they occur at an average rate of 1/a per second. Another way of describing a Poisson distri- bution of events is the following. The probabil- ity of an event in a small interval of duration dl is (l/a)dl and is independent of whether or not events have occurred in any other nonover- lapping intervals. IBUTION S Let us suppose that we have a record of the course of the target up to the present time and a complete statistical description of the set of target courses. What can then be said about the position of the target t t seconds from now? If we were able to analyze the data completely the most we could obtain would be a probability distribution function for the future position. This distribution function would give the prob- ability, in the light of the course history, of the target being at any point in space at the future time. This function would assume large values at likely points and low values at un- likely points. For t, small the distribution would be highly concentrated and for larger l t it would tend to spread out. In the simple case we have been discussing, of a Poisson distribution of sudden changes in type of course, the distribution consists of two parts. First, there is a spike of probability at one point, the continuation of the present pre- dictable segment. Second, there is a continuous distribution which corresponds to possible changes to a new segment during the time of flight. As t, increases the total probability in the spike decreases exponentially toward zero, and the total in the continuous part increases exponentially toward unity. The behavior is roughly as indicated in Figure 1. i i i 3-2-1 ( ) 1 2 3 Figure 1. sition of courses. Probability distribution of future po- target, assuming piecewise analytic A very different type of future position dis- tribution is exhibited with other assumptions about the target courses. For example, suppose the courses were random noise functions with the power spectrum P ^ = ^Ar-, • fl2 + 0)2 A typical noise function with this spectrum is shown in Figure 2. In Figure 3 is shown a typical velocity under the other assumption, that the courses are piecewise analytic and in fact straight lines between breaks. If the breaks are Poisson distributed, both Figure 2 and Figure 3 have the same power spectrum, l/(a 2 + a. 2 ). The future distribution of veloci- ties for Figure 3 is shown in Figure 1, and for Figure 2, it will be as shown in Figure 4. In the random noise case the future distribution is a CONFIDENTIAL THE PROBABILITY DISTRIBUTION OF FUTURE POSITIONS 103 Gaussian distribution with no spike. The center of this distribution decreases exponentially to- ward zero with increasing time of flight ac- cording to the formula Xtj = A'o e " f where X is the present value of the function and X., is the mean of the future distribution. *t t 1 — , 1 Figure 2. Typical noise function. The standard deviation <r of the distribution in- creases exponentially toward the rms value of the function according to u = A(l - e-*"/). Supposing that this distribution function could be determined, where should the gun be aimed? The answer to this will depend on two factors: the gun dispersion, and the lethal o o 5* i Figure 3. Typical velocity function. effects of the shell. If the gun is aimed to explode the shell at a certain point in space, the shell will not necessarily explode at that point, but rather there will be a distribution of positions centered about the point aimed at, because of gun dispersion. Also, if the shell explodes at a certain point and the target is at another point, there will be a certain proba- bility of lethal effect which decreases rapidly with increasing distance between the points. These two functions could be combined by a product integration to give the probability of t if the target is at one point and 1 1 ■2-1 I 2 3. Figure 4. Probability distribution of future posi- tion of target, assuming courses with random noise properties. the gun aimed to explode the shell at a second point. To determine the probability of a hit when aiming at a certain point, then, we should multiply the probability of the target being at each point in space by the probability of lethal effect when it is at that point and integrate the product over all space. The optimum point of aim will be the one which maximizes this in- tegrated product. In one dimension this may be expressed mathematically as follows. Let P(x) be the CONFIDENTIAL 104 THE ASSUMPTION OF ANALYTIC ARCS future position distribution of the target, so that P(x)dx is the probability of it being in the interval from x to x + dx at the future time. Let Q(x,y) be the probability of hitting the target if the gun is aimed at point y and the target is at point x. Then the total probability of a hit when aiming at point y is H(y) I P{x) Q(x,y\ dx . The point of aim y should be chosen to maxi- mize R(y). In the cases we consider, the lethal radius of the shell and the dispersion of the gun are both assumed to be small in comparison with the range of future positions if there is a change of course during the time of flight. This means that Q(x,y) is small unless x is xery near to y. Q(x,y) can be, in fact, considered to be a 8 function of (x-y), and the value R(y) is then just a constant times P(y). Thus, the best aiming point under this assumption is the most probable future position of the target. The as- sumption of small lethal distance is generally valid with antiaircraft fire and ordinary chemi- cal explosive shells. Now the most probable future position in our case is the spike of probability corresponding to the analytic extrapolation of the present seg- ment of the target course. To determine its position one must find the parameters of this segment and evaluate for t, seconds in the future. For example, if the segments are as- sumed to be straight lines (constant velocity target) the velocity components are determined and multiplied by t, to give the predicted change in position. These changes are added to the present position to give the future position. If helical or parabolic segments are assumed, the parameters of these curves are determined from the past data, and the curves extrapo- lated t, seconds into the future. These conclusions may be contrasted with the idea of aiming at the point which mini- mizes the mean square error. The least squares criterion amounts to aiming at the mean or center of gravity of the future distribution of position. This point will ordinarily be under the continuous part of the distribution and not at the spike; e.g., the point marked in Figure 1. Its position depends to a considerable extent on distant parts of the distribution, which would surely bo complete misses in any case. The chief advanta.:; . the least squares criterion is that it fits in well with the mathematical tools suitable to these problems, leading to solvable equations. The least squarns < nterion will still appear in our analysis in rKat we attempt to smooth our course param>:t. ra in such a way as to minimize the mean square error in these, a very different thinp fr m minimizing the mean square error in th* redicted position of the ••* \ECES<] I V OK A SHARP CUTOFF The changes in the course parameters be- tween-adjacent segments can be very large. Also, at the start of operations and in changing from one target to another there will be large and erratic variation of the input to the smoothing and predicting circuits, unrelated to the present target course. If any of these data are used in prediction, the result will almost surely be a miss because of the small lethal radius of the shell. The only way to eliminate these errors in a linear invariable system is to have all weighting functions cut off sharply after a short time. Then ail data over a certain age are eliminated. Hits will occur only when the target has been on a predictable segment for this length of time or more and remains there at least t, seconds in the future. Suppose the weighting function for velocity has a 1 per cent tail beyond the cutoff point and that the trackers start following the target from a zero position. Then after the smoothing time there will be, because of the lack of exact cutoff, a 1 per cent error in velocity. If the time of flight were 15 seconds and the target velocity 200 yards per second, this represents an error of W yards in predicted position. Since this is comparable to the other errors in a typical director, we conclude that the tail of the smoothing curve should not be much greater than 1 per cent of its total area. 95 CALCULATION OF THE BEST SMOOTHING TIME Under the assumptions we have made, the proper smoothing time to maximize the number of hits can be determined as follows. Let P(l) CONFIDENTIAL CALCULATION OF THE BEST SMOOTHING TIME . 105 be the probability that a predictable segment of the course lasts for I seconds or more. In the Poisson case this function is P(l) = e-' /a With a given smoothing time S there will be a certain probability of hitting the target, as- suming it has been on the present segment for S seconds in the past and will remain there for t f seconds in the future. We assume changes in course to be so large that any change re- sults in a miss. This probability of a hit Q(S), provided it remains on the course, will be an increasing function of S. Ordinarily the stand- ard deviation will decrease as the square root of the smoothing time. We have assumed the lethal radius of the shell small compared to the dispersion of shells about the target. The prob- ability of a hit will then vary inversely with the volume through which the shells are dis- persed. If the gun itself had no dispersion but all errors were due to tracking errors (and if the tracking error spectrum is flat), the prob- ability of a hit would then vary as KS*f* for S in the region of interest. This is because there are three dimensions and the expected error in each of these is decreasing as S~ 1/2 . With gun dispersion present, Q(S) will have the form w>-*(.? + .ij) -3/2 where a, is the standard deviation due to the gun dispersion, and a 2 y/a/S that due to track- ing errors. The sum of the squares is the total variance in each dimension and the three- halves power gives the total dispersion volume. When these two functions P(l) and Q(S) are known, the best smoothing time is that which minimizes the product P(S + t f ) ■ Q(S) . The first term is the probability of a predict- able segment of the course lasting S -+- t f sec- onds, and the second term is the probability of a hit if it does last that long. Therefore, the product is the probability of a hit with smooth- ing time S. In the Poisson case, with no gun dispersion, the calculation is as follows : P(l) = e s + 1, P(S + t f ) = e~~ = Ae Q(S) = .S« f(S) = P(S + t,)Q(S) = Be~*'° ■S/a f'(S) =b[< -S/a 3 ^1/2 _ l^-S/o^S/! S = la 2 The proper smoothing time is % of the aver- age segment length, and is independent of the time of flight and all other factors. The presence of gun dispersion and computer errors which are independent of smoothing time decreases the best S from this value. In this case the equation for optimal S is the quadratic , 2S 3 a 0; hence S — = a = -4 + a^/c\ + 6<r« 2,? Here n, is the part of the errors which is in- dependent of smoothing time (dispersion errors in the computer, etc.) and a t is the error which varies inversely with the square root of S, a, being its value at S = a. Ordinarily ^ is several times a., in which case we have approxi- mately ~* ~a~ o\ ffi Is «Tl\2 There are other factors which we have neg- lected, which decrease the best smoothing time still further. The wandering of the target about the predictable segments assumed in the above simplified analysis makes old data less reliable and therefore reduces S. Also, there is the tac- tical consideration that when starting to track a target it is desirable to commence firing as soon as possible, even if reducing this time makes individual hits somewhat less probable. For these and other reasons the best smooth- ing time will be just a fraction of a. CONFIDENTIAL 106 THE ASSUMPTION OF ANALYTIC ARCS 94 NONLINEAR AND VARIABLE SYSTEMS The compromise required in choosing a cer- tain definite smoothing time can be eliminated by the use of nonlinear elements. In particular, if a method is devised for determining when changes of course occur, this indication can be used to start a new linear but variable smooth- ing operation, so that the device uses all the data pertinent to the present segment and no data from previous segments. There is a clear improvement in such cases although not so great as might be expected. There are many practical difficulties in proper adjustment of such a "trigger" action. If the trigger is too sensitive it will assume new segments due merely to tracking noise and seldom allow suffi- cient smoothing for accurate fire. If it is too insensitive it fails in its function of quickly locating changes of segment. Since the noise and target courses are subject to considerable variation, this aujustment is not easy. In such a system the smoothing may be linear — the only nonlinearity is the tripping circuit. The analysis of best weighting func- tions, etc., given in later chapters can for the most part be applied to such cases. There may also be advantages to be derived from making the smoothing operator depend on the general position in space of the target relative to the gun. The smoothing time may be varied, for example, as a function of the time of flight. This type of variation would be slow compared to the noise frequency, and here again the linear analysis can be used. Whether any real advantage can be obtained by "strongly" nonlinear smoothing in practical cases other than these two possibilities is ques- tionable. CONFIDENTIAL Chapter 10 SMOOTHING FUNCTIONS FOR CONSTANTS The analytic arc assumption described in the previous chapter immediately allows us to reduce a vast proportion of data-smoothing problems to a relatively conci'ete form. Obvi- ously the arc will be specified by a number of parameters and the principal object of the com- puting and data-smoothing circuits must be to isolate values of these parameters on the basis of which a prediction can be made. In practi- cal cases the instantaneous values of the parameters are isolated by coordinate con- verters. The function of the data-smoothing circuit is to provide a suitable average from these instantaneous values. This is called "smoothing a constant'' here since the param- eters are assumed to be constant along each arc, although they may change radically from one arc to another. The data-smoothing network is most con- veniently specified by its impulsive admittance. (See Appendix A.) In accordance with the assumptions made in the previous chapter, it will be assumed that the desired impulsive ad- mittance is identically zero after some limiting time T. Thus, T seconds after a change from one analytic arc to the next the new parameter value is established. T is the so-called "settling time" of the data-smoothing network. With the settling time limit given, the prob- lem of choosing a suitable data-smoothing net- work reduces to that of finding the best shape of the impulsive admittance characteristic for t < T. Obviously this shape determines how the output of the network changes in going from the parameter value appropriate for the first arc to that appropriate for the second. The exact way in which the response settles from one constant value to the next is, however, usually of comparatively little interest. The shape of the weighting function is of impor- tance chiefly because of its effect on the noise. For each noise spectrum there is, in principle, an optimum shape for the weighting function. The present chapter approaches the problem of choosing a shape which will minimize the effect of noise from several points of view. It should be noted that the term noise as used here does not necessarily refer to the errors associated directly with the tracking data. The tracking data may have been subjected to co- ordinate conversions, differentiations, or other processes of computation before reaching the data-smoothing network." The noise associated with the signal to be smoothed thus will usually have characteristics differing from those of the noise associated with the tracking data. 10 1 EXPONENTIAL SMOOTHING Before attacking the problem of smoothing a constant in a systematic way it is worth while to consider an important special case. This is the so-called exponential smoothing circuit. It leads to a data-smoothing network in which the output V is related to the input E by V(t) r) dr so that the impulsive admittance W(t) is an exponential function of time, as illustrated by Figure 1. -2 2 4 6 Figure 1. Simple exponential weighting function. An impulsive admittance of the type shown in Figure 1 does not show any very definite settling time. The exponential curve ap- proaches zero gradually, and it is a long time after a change in course before the effects of the data obtained on the old course are negli- gible. This is obviously an undesirable result, 1 In exceptional circumstances the physical apparatus in which these processes are carried out may also be sources of additional noise. CONFIDENTIAL 107 108 SMOOTHING FUNCTIONS FOR CONSTANTS and the exponential weighting function is con- sequently not a recommended one for situations to which the analytic arc assumption applies. The exponential solution is, however, described here because it occurs in such a vast variety of cases. It is found, in fact, whenever the data- smoothing device is specified by a linear first- order differential equation with constant coeffi- cients. It may thus correspond to many simple situations. For example, this is the result which would be obtained in an electrical circuit if we smoothed the data by placing a simple shunt capacity across a resistance circuit. In mechanical structures it is encountered when- ever the damping depends either upon simple inertia or a simple compliance. Simple exponential smoothing also occurs in a variety of other situations which may be somewhat less obvious. For example, it is the effective result in either an aided laying or a regenerative tracking scheme whenever the ratio between rate and displacement correc- tions is fixed. Another somewhat similar ex- ample is furnished by the feedback amplifier circuit shown in Figure 2. Since rapid fluctua- Figurx 2. Feedback amplifier circuit giving simple exponential weighting function. tions in the output of this amplifier are fed back through the capacity and tend to oppose the input voltage, the structure acts as a smoother, and more detailed analysis would show that it has characteristics similar to those obtained by using a shunt capacity across a resistance circuit. The structure is introduced here because considerable use is made of it in connection with the discussion of nonlinear smoothing in a later chapter. One simple conclusion about data-smoothing networks can be drawn immediately from this discussion. Since all structures simple enough to be specified by a first-order differential equa- tion give exponential smoothing, which has no very well-marked settling time, it is clear that a data-smoothing network which shows a well- defined settling time must probably be at least moderately complicated. »°» CURVE-FITTING METHOD Consider the signal E shown in Figure 3 under the assumption that the true signal is constant and the superposed noise is random t-T t Figure 3. Piecewise constant signal with noise. with a flat spectrum. The best constant A, in the least squares sense, which can be fitted to the signal from t - T to Ms that which mini- mizes Jt-i [A - E(X)] 3 d\ , viz., ff-T E(K) . (1) Comparing this with equation (2), Appendix A, it will be seen that A, which is obviously a function of t, is the response to the assumed signal of a network whose impulsive admit- tance is W(t) 1 T < t < T (2) This is the best weighting function for smooth- ing under the assumed circumstances. It is illustrated in Figure 4. A more complex situation is one in which the true signal is a line of constant slope with mu T JL T Figure 4. Best weighting function for smoothing piecewise constant signal. CONFIDENTIAL AUTOCORRELATION METHOD 109 superposed flat random noise, as shown in Fig- ure 5. For convenience the analysis will be conducted in terms of the age variable r » t - \, t-T t Figure 5. Piec^wise linearly varying signal with noise. The best straight lint' A — Br which can be fit- ted to the signal from r = to t = T is that which minimizes £ T [A-Br-E{t-r) Vdr. Hence A and B must satisfy simultaneously t t* i r T Eliminating A, we get whence by partial integration (3) B t) • t(T - r) dr Comparing this with (7), Appendix A, it will be seen that B, which is obviously a function of t, is the response to the derivative of the as- sumed signal of a network whose impulsive admittance is W(t) f' fV'f) 0<t<T (4) This is the best weighting function for smooth- ing the derivative of the signal under the as- sumed circumstances. It is illustrated in Fig- ure 6 and is generally referred to as the "para- bolic weighting function." It should be noted also that the right-hand member of the first of equations (3) is form- ally the same as that of equation (1). Hence the response of the network specified by (2) T Figure 6. Best weighting function for smoothing piecewise linearly varying signal. and illustrated in Figure 4, to the type of signal shown in Figure 5, will correspond to the value on the best straight line T/2 seconds back from t, the present time. This network is still the best for smoothing the signal, but it introduces a delay of one half of the smooth- ing time. The delay may be reduced only at the price of a reduction in smoothing unless the smoothing time is increased. AUTOCORRELATION METHOD The autocorrelation method with finite set- tling time was first used by G. R. Stibitz in numerical determination of the best weighting function for smoothing the derivative of track- ing data with typical tracking errors. This method was also used to determine the sensitiv- ity of smoothing to departures of the weighting function from the best form. The analysis is based up V{t) r) W(r) dr t> T for the response to the derivative of the error time function g(t) of a network whose impul- sive admittance or weighting function W(t) is identically zero for t > T as well as for t < 0. Since measured tracking errors are generally tabulated only at 1-second intervals, the in- tegral may be approximated by the sum - 1 m+Oi) m-(H) for integral values of t. The instantaneous transmitted power is the CONFIDENTIAL 110 SMOOTHING FUNCTIONS FOR CONSTANTS square of this expression, and the average transmitted power is P.v, = hill J. V ytt t \ * , To This may be expressed in the form ^•.= LLW m _ {t2) -C m _ n -W,_ (h) (o) where M.a - 1 AT m — u is the autocorrelation of the errors. Having computed the autocorrelation, (5) may be mini- mized with respect to the W's by familiar methods, under the constraint mm 1 1 " - * The values of W thus obtained are the speci- fication of the best weighting function." Equa- tion (5) may then be used to determine the sensitivity of smoothing to departures of the weighting function from the best form. Proceeding along this line, Stibitz found that the best weighting function for typical actual tracking errors was generally intermediate to the uniform and parabolic ones shown in Fig- ures 4 and 6. Furthermore, Stibitz found that the difference in smoothing obtained from the best weighting function on the one hand and from the uniform or the parabolic weight- ing function on the other hand, is negligible in practice. The autocorrelation method was later for- malized by R. S. Phillips and P. R. Weiss who incorporated it into a theory of prediction. 7 A brief exposition of this formulation is given in Appendix B. ELEMENTARY PULSE METHOD For the purposes of this method, an ele- mentary noise pulse is defined by a time func- tion F (t) which satisfies the following require- ments: 1. Identically zero when t < 0. 2. Contains no terms which increase expo- nentially with time. 3. Power specLium N(„> 2 ) is the same as that of the noise. The noise is then regarded as the result of elementary noise pulses started at random. Alternatively, it may be regarded as the result of flat random noise passed through a network whose transmission function is S(p) = L [F„(t)]. As a matter of fact, only S(p) is required in the analysis, and this is readily de- termined from the relation |S(uo)l 2 = AF(«*) , together with the condition that S(u>) cor- responds to the transmission function of a minimum-phase physical structure (cf. Appen- dix B). The response F(t) to the elementary noise pulse F u (t) of a network whose impulsive ad- mittance is W(t) is given by the operational equation F(() = S(p) ■ W(t) in accordance with the footnote in Section A.5, Appendix A. The best form for W(t) is there- fore that which minimizes the integral /.: [F(0i J dt under the restriction when t > T W(t) dt (G) (7) b The computations involved may be considerably re- duced by noting the symmetry property proved in Sec- tion B.2, Appendix B. This is as much of the elementary pulse method as we shall need in order to reconsider the cases treated in Section 10.2. For the treat- ment of more general cases the method is de- scribed in greater detail in Appendix B. The minimization of the integral (6) under the restriction (7) reduces to a simple isoperi- metric problem in the calculus of variations, in cases in which S(p) is a polynomial in p. It is essential first of all, however, to note that if S(p) is of degree n, the integral (6) will con- verge only if W(t) is differentiate at least n times. In other words, W (t) must have con- tinuous derivatives of all orders up to the (n-l)th inclusive, although the nth derivative may have finite discontinuities. In particular, if W(t) is to be zero outside of < t < T. its CONFIDENTIAL ELEMENTARY PULSE METHOD 111 derivatives of orders up to the (n-l)th inclu- sive must vanish at both t = and t u T. These 2n boundary conditions must be imposed on the solution of the Euler equation which in this case is Wit) = A . '(*M-i) a is a constant parameter which is finally ad- justed to that the restriction (7) is satisfied. The first case treated in Section 10.2 is one in which N(„r) = 1, whence Sip) = landF(f) - W{t). The integral (ti) is a minimum under the restriction (7) if Wit) is constant by intervals. The restriction (7) then requires W(t) to be of the form (2). The case of first derivative smoothing treated in 10.2 is one in which X \ *») = «,, 2 , whence S ip) = p and Fit) =- Wit). If the integral (6) is to converge at all, 11/ (t) must not have discon- tinuities of impulsive or higher type; in other words, Wit) must be continuous through all values of t. The integral is a minimum under the restriction (7) if W(t) is constant by intervals. The restriction (7) then requires W(t) to be of the form (4). These results may be generalized immedi- ately. In whatever way the signal to be smoothed may have been derived from the tracking data, let the power spectrum of the noise associated with it be N(m 2 ) = a, 2 ". Then Sip) =p"andF(f) = W^ (t). If the integral (6) is to converge at all, w' n - n (t) must be con- tinuous through all values of t. The integral is a minimum under the restriction (7) if W Vin) it) is constant by intervals. The restric- tion (7) then requires W(t) to be of the form W(t) (2n + 1) ! ( + 1)\ ft / t \1 ■ ssr [tO-jOJ o< i <T. (8 ) It may be noted that the convergence re- quirements which arise in the foregoing dis- cussion are directly related to the discussion and theorem in Section A.8, Appendix A, with respect to the relationship between discontinui- ties in the impulsive admittance and its deriva- tives on the one hand, and the ultimate cutoff characteristic of the transmission function on the other hand. The continuity of W lM) (t) is obviously required to make the transmission fall off ultimately at the rate of 6(n+l) db per octave against the rise of 6n db per octave in the noise power spectrum. The integral (6) may also be used to evalu- ate the relative advantage of the best weighting function over another weighting function. As an example, consider the case where the weight- ing function (2) is the best. The value of the integral (6) in this case is 1/T. If the weight- ing function (4) is used against the same noise, the value of the integral (6) is 6/5 T. Hence, as far as rms error or standard deviation is concerned, the second weighting function is V5/6 or 0.913 as efficient as the first. CONFIDENTIAL Chapter 11 SMOOTHING FUNCTIONS FOR GENERAL POLYNOMIAL EXPANSIONS THE THEORY of "smoothing a constant" de- veloped in the preceding chapter will be extended in this chapter to the problem of smoothing a polynomial function of time of any prescribed degree. The extension is, however, restricted to the case of a flat noise spectrum. In addition to the smoothing problem, the analysis also provides a way of designing a network which will extrapolate the polynomial a given distance t, into the future. The network is so arranged that t, is continuously variable. In addition, the degree of the polynomial can readily be changed to fit changes in the com- plexity of the assumed form of the data, apart from noise. It is clear that these results amount, in a certain sense, to an alternative to Wiener's method for the design of prediction circuits for general time series. Thus, to predict a time series of any given complexity we would need only to begin with a polynomial of sufficiently high degree to fit the observed data, and extra- polate. Aside from the restriction to a flat noise spectrum, perhaps the most obvious dif- ference from Wiener's method is the fact that the settling time restriction limits the data upon which the prediction rests to a finite in- terval in the past. To advance such a prediction theory seriously, however, it would be neces- sary to go much farther into the way in which the degree of the polynomial is established and the justification for assuming that the extra- polated value represents a probable future value for the function.' This general discussion will not be under- taken here. Since prediction with high degree polynomials will certainly be sensitive to minor irregularities in the data, tracking errors would necessarily limit the application of the method in any case. If we confine ourselves to reasonably low degree polynomials, however, » As an example of possible difficulties we may notice the fact that two polynomials of different degree which approximate a given function as closely as possible, in a least squares sense, in a prescribed interval fre- quently differ radically outside that interval. the method is useful. An example is furnished by the prediction of airplane position, in rec- tangular coordinates, by quadratic functions of time. Here the square terms represent the effects of accelerations in the various coordi- nates. We can defend the inclusion of such terms on the ground that it is plausible to as- sume that an airplane may experience constant accelerations, due to turns, the force of gravity, etc., for considerable periods of time. The linear term represents plane velocity and needs no defense. The constant term, of course, gives the plane position at some reference time. In- cluding it in the smoothing operation is equiva- lent to introducing "present-position" smooth- ing of the sort suggested by the broken lines in Figure 1 of Chapter 7. h Aside from its direct interest as a possible prediction method, the analysis in this chapter is also of indirect interest for the additional light it sheds on the effect of the noise spec- trum on smoothing functions. It turns out that smoothing a power of time, with a flat noise spectrum, is equivalent to smoothing a constant with a somewhat different noise spectrum. Thus the smoothing functions developed for polynomials are also useful as special cases of smoothing functions applicable to constants. n.i Let A be any past value of time and let t be the present value. If the data is fitted with a smooth curve E (k) , the predicted value may be taken as E(t + t f ). The procedure of fitting is the familiar one of minimizing the integral [ E(\) - E(\) ] J W,(t,\) rfX b In the circuit of Figure 1, Chapter 7, however, the smoothing network would produce a lag in the present- position data delivered to the prediction circuit, and this lag would, of course, mean some error in follow- ing a moving target. In the method described in this chapter such lags are automatically compensated for by adjustments in the coefficients of the other terms of the polynomial. 112 CONFIDENTIAL 113 with respect to disposable parameters in E(k) and a prescribed weighting function W n (t,k). The lower limit of the integral is indicated as — oo in compliance with the physical impossi- bility of discriminating between relevant and irrelevant data, with fixed linear networks, ex- cept on the basis of age. The burden of dis- crimination must be relegated to the weighting function which must be a function only of the age t - A. Under the ideal restriction that W n (t — A) is identically zero when t - A > T or A < t — T, the indicated lower limit of the in- tegral is purely nominal. As in Section 10.2, it is convenient to con- duct the analysis in terms of the age variable t = t — A introduced there. If In terms of the forward time A, (2) and (3) reduce to F(r) = F(r) = K{\) the integral to be mir in the form I may be expressed |>» - F(t)\ 2 ir„(r) i/t . tl In accordance with the discussion of quasi- distortionless transmission networks in Section A. 10, Appendix A, the smooth curve K (a) should be a polynomial in A. Hence F(t) should be a polynomial in r. It will be more convenient, however, to express F(t) formally as a linear combination of polynomials in t which may be orthogonalized. Hence, let F{r) = \\+\' i -G t (T)+\\-(,\(T)+ - +IV^'„<T) (2) where G,„(t) is an mth degree polynomial in t. Let W u (t) be normalized in the sense that f W (r) dr = 1 Jo and the G m (r) be orthogonalized with respect to the weighting function W„(t) in the sense that / G,(t) G m (r) W (t) dr = if / * m Jo » f, = j - if / = m (G = 1, Ao = 1). The integral (1) is then a minimum with respect to the V m 's in (2) if V m = k m jf 00 F( T ) ■ GJt) ■ H'„(t) <tr . (3) E(\) = Y n (t) + Wit) ■ G x (t - A) + V,(t) ■ G t (t - A) + - + V n (t) -G n (t-\) (4) where !'„,(/) = k m f E(\) -G m (t-\). W (t-\)dk.(5) Expression (5) identifies the V m (t) as the responses to E(k) of fixed linear networks whose impulsive admittances are ir,„(r) = k„,G m (r) : W (r) . (6) By (4), the predicted value may be obtained by a linear combination of the responses of these networks, viz., Mi + U) = Y»(t) + Gii-t,) ■ \\(f) + G,(-i f ) -Vtit) + ■■■ + G n (-i f ) ■ V n (t) . (7) A schematic representation of an nth order smoothing and prediction circuit, based on (7), is shown in Figure 1, where the G„, ( — t,) are represented as potentiometer factors dependent on the time of flight. E(nt,) E(t>- I 1 i— Wv- - Y,(P) -AMAv-i U 1 G.C-t,) Y.(P> AAAr-r t> G n (- V 4- Figure 1. Schematic representation of nth order smoothing and prediction circuit. Alternatively, (7) may be written K(t + t/) = E(t) + - //) - G,(0)] • V,(0 + ••• + [G n ( - t f ) - G„(0)] • V n (t) (8) where E(t) is then replaced by Eit) when position data smoothing is to be omitted. It is not necessary that the G,(r) polyno- mials be orthogonal. However, the circuit switching required to reduce or increase the order of the prediction is simplest when the G„,(t) polynomials are orthogonal. Orthogonal polynomials corresponding to any CONFIDENTIAL 114 SMOOTHING FUNCTIONS FOR POLYNOMIAL EXPANSIONS weighting function W ( T ) are readily derived by well-known methods,. The weighting function W ( r ) may be deter- mined by either of the methods described in Appendix B as the best weighting function for smoothing position data, under prescribed tracking error characteristics. Then the best impulsive admittances W m ( T ) for a smoothing and prediction circuit, are prescribed by (6). The relationship (6) shows that if the pre- scribed weighting function W ( T ) satisfies the formal requirements for physical realizability, so will all of the impulsive admittances W m ( r ). Of the standard sets of orthogonal polynomials those of Laguerre appear to be the best adapted to physical realization. The Laguerre polyno- mials L„ (a > ( T ) are orthogonal in < t < oo with the weighting function r a e~\ However, such a weighting function is, in general, very unsatisfactory from the practical point of view of settling characteristics. It is possible of course to approximate any prescribed weighting function W (t) as closely as may be desired in a physically realizable form, derive a set of orthogonal polynomials based on the approximate form, and determine the impulsive admittances W m ( T ) from (6). However, such a procedure leads to complexities of network configuration which increase very rapidly withrthe index to. This increasing com- plexity is hardly justifiable in practice. From the foregoing considerations, it ap- pears that the most practical procedure is to derive all of the impulsive admittances W m ( T ) without regard to physical realizability, ap- proximate them independently in physically realizable forms of independently prescribed complexities, and modify or redetermine the potentiometer factors in accordance with the discussion in Section A.10, Appendix A. 11 a WEIGHTING FUNCTIONS FOR DERIVATIVES The impulsive admittances defined by (6) for m > may not be regarded as weighting functions even though the response of the cor- responding networks to E (a) is, by (5) Vm (0 - f K(t -r) • W m (t) 'fir, Jo because, with the exception of W e (r), the W m ( T ), as will presently be seen, cannot be nor- malized. The term weighting function is re- served for the functions defined by (11) below. Since r r is a linear combination of the G, (t) where s = 0, 1, • • • , r, it is obvious from (6) that oo / ?WUl) dr = when r < m . In particular / WJr) dr = when m > . Since the transmission function Y m (p) of a network is the Laplace transform of its im- pulsive admittance (see Section A.3) , we have /CO W m (r) e~'* dr y ( - p) r r ■ The first m terms in this series vanish. Hence Y m (p) will be of the form T m (p) = r"y-(p) (10) where y m (0) ^=0. This permits us to regard the network whose impulsive admittance is W m ( T ) as an instantaneous mth order differentiator, corresponding to the factor p* in (10), in tandem with a purely smoothing network whose transmission function is y m (p). It is convenient to associate a weighting function w m ( T ) with the purely smoothing net- work whose transmission function is y m (p) . Dividing (10) through by p m the resulting operational equation may be interpreted (see Section A.5) to mean that the weighting func- tion w m ( T ) is the m-fold integral of the im- pulsive admittance W m ( T ) between the limits and t. This is expressed by o Jo WmiT) ' {dT)m - (11 > By a relationship similar to (9) between y m (p) and w Hl (r) , it follows from y m (0) ^ that u>„(r) dr * . CONFIDENTIAL LEGENDRE POLYNOMIALS 115 Hence the w m ( T ) may be normalized in the it is readily determined that sense that jT w m (t) dr = 1 jp- / [G«(t)]» W.(t) dr " ^/ o (ml)' (2m)! (2m + 1)! ' for all values of to. However, this may he done in general only if the G„(t) polynomials, are Then, by (6) not normalized in the sense that k m = 1 i&c any value of to > 0. It is in fact readily shown that W m (r) = (-)m .( 2rw + U ! p m (2 T - 1) £ r :£ 1 the coefficient of i* in G,„(t) must be the same as that of r m in c T . 11.3 LEGENDRE POLYNOMIALS m! = r > 1 . Substituting this in turn into (11) and making use of Rodrigues' formula The Legendre polynomials P„ t (x) are orthog- onal with respect to the range-- 1 < x < 1 and uniform weighting. In other words, the poly- or nomials P„(2t — 1) are orthogonal with respect to the range < t < co and the weighting func- tion 6 ( — \ m d m p -<*> " SOT (1 " *>" p - (2t - 1} - S^r £ M 1 - w W (r) = 1 when <. r <, 1 = when t > 1 . It is known from Section 10.4 that this form for the weighting function W (t) is best in case the tracking errors are flat random noise. In the integral (1) to be minimized, the G m (r) polynomials should then be The first few of these are tabulated below. it is finally found that (2m -I- 1)! = T > 1. [t(1 - t)]« £ T £ 1 (12) By a relationship of the form of (9) the transmission functions y m (p) corresponding to the weighting functions w m ( T ) may be deter- mined. The first three are 1 - e-* Vo(p) m G m (r) 2 ~r 2 i_I + I 1 12 2 2 3 — - + - - - 120 10^ 4 6 6 Vt(P) - J t l(P - 2) + (p + 2)9-'] V*(P) - p 1(P» " 6p + 12) - (pi + 6p + m-'\. These may be written in the form Vm(p) - QmM • r M where (13) With the help of the formula j [P m (z))*d* 2m + 1 The unit of time being equal to the nominal smooth- ing time. &(«) QM) 0.(«) CONFIDENTIAL sin x / J\ -— V - V X cos z 16 ~ xt ) SEj * ~ 31 006 * (14) 116 SMOOTHING FUNCTIONS FOR POLYNOMIAL EXPANSIONS or in the infinite power-series form „r, (» + «i Vt(p) = 60 £ ■ -0 (n + l)(n + 2) (n + 5)! (-P)V (15) Methods for obtaining physically realizable ap- proximations to the weighting functions w m (r) or impulsive admittances W m ( T ), based upon the Q functions (14) and the series expansions (15) are described in Chapter 12. CONFIDENTIAL Chapter 12 PHYSICAL REALIZATION OF DATA-SMOOTHING FUNCTIONS This chapter will be devoted to a brief re- view of some of the methods and techniques which have been used in the physical realiza- tion of data-smoothing or weighting functions. The first two sections will be devoted to meth- ods for determining physically realizable ap- proximations to a desired weighting function. The third section takes up the use of feedback amplifiers and servomechanisms in order to avoid the use of coils of generally fantastic sizes. The final section takes up the design of resistance- capacitance networks. Methods of deriving physically realizable ap- proximations of best weighting functions may be divided into two classes, which may be called, for convenience, /-methods and p-meth- ods. The i-methods are those in which a pre- scribed best weighting function W(t) is approximated directly by a function W„(t) of realizable form, viz., a sum of decaying expo- nential terms and exponentially decaying sinu- soidal terms. However, the <-methods are most useful when the approximation is restricted to a sum only of exponential terms. According to the discussion in Section A.9, Appendix A, such a restriction corresponds physically to passive RC transmission networks. A <-method was used by Phillips and Weiss in the reference quoted in Section 10.3 to obtain an approxi- mation with one decaying exponential term and one exponentially decaying sinusoidal term. However, this method rapidly becomes un- wieldy as the number of terms is increased. The p-methods are those in which the ap- proximation is derived indirectly from the transmission function Y(p) corresponding to W(t). A rational function Y a (p) approximat- ing Y(p) is first determined. If it is realizable, and it usually is, then W a (t) = L^lYaip)]. In general, Y tt (p) will have complex poles and, therefore, W a (t) will have exponentially decay- ing sinusoids as well as simple exponentials. This gives the p-methods a considerable advan- tage over the f-methods in more efficient use of network elements. The fact that this generally calls for impractical element values in passive RLC networks is not serious. As shown in Sec- tion 12.3, the use of coils may be avoided entirely by the use of feedback amplifiers. 121 ^-METHODS To describe the ^-method," let W a (t) = A ie -i\ + A*—* + ■ ■ ■ + Ae n -.t (1) where the a's are prescribed and the A's are to be determined. Two considerations are involved in the determination of the A's. The first con- sideration is based on the relationship between the continuity conditions at t = and the ulti- mate slope of the loss characteristic as ex- pressed in the theorem in Section A.8. Accord- ingly, a number of relations of the type Ai + A-i + ■ ■ . -f- A n = a\ A x + a, At + ... + a„ A„ =0 (2) «' A , + al A 2 + . . . + a „ r A n = r < n - 1 must be satisfied. This leaves n - r - 1 of the A's for the second consideration. The second consideration concerns the man- ner in which the approximation in the range t > is to be made. The approximation may, for example, be required to pass through n - r - 1 points on W(t) or, the first n - r - 1 moments of the approximation may be required to be equal to the corresponding moments of W(t). The latter is expressed by relations of the type Ai A 2 An 1 /* c ° -+-+■■■+- = —77, / W(t) /— dt s - 1, 2, • • • , n - r - 1 (3) Foster's investigations were concerned only with the parabolic weighting function (4) Chapter 10, so that only the first of (2) was involved. Numerical studies led to the belief that, with a given number of a's, the best ap- proximation was to be had from the case in ■ The i-method is principally due to R. M. Foster. CONFIDENTIAL 117 118 PHYSICAL REALIZATION OF DATA-SMOOTHING FUNCTIONS which all of the a's are equal. Hence the natural center of attention was the special form W a (t) = (Ait + Ad* + • ■ • + An-if -»)«-*. (4) At large values of t this expression reduces ap- proximately to the last term, and if it is as- sumed that A n .i = 1, the settling condition fixes a to at least a first approximation. The rest of the work of approximating the parabola is then equivalent to a problem in polynomial approxi- mation. Once the A's are determined, a better value of a can be found from the settling con- dition, and the process gone through again. If the a's are only approximately equal, the approximation will still behave approximately like (4) with an average value used for a. The difficulty with equal or nearly equal a's is that it leads to networks with extreme element values. In order to secure satisfactory element values, it is generally necessary to depart sub- stantially from the condition of equal a's. This results in some, but not a large, loss of effi- ciency in approximating the parabola. Foster recommends that the a's be chosen as a geo- metric series, with their geometric mean more or less around the equivalent point for equal a's. With four a's he suggests that the constant ratio in the series may be 3:2, whereas with only two a's the ratio should be raised to 2:1. These are, however, only rough values and obviously depend on individual opinion of what constitutes an unreasonable element value. As a matter of experience, it turns out that the characteristic first obtained usually has a rather long and slowly decaying tail, as shown in Figure 1. This, of course, is equivalent to a Figure 1. Approximation to parabolic weighting function, showing poor settling characteristic. correspondingly long "settling time," or time before a useful prediction can be made. In practice, therefore, after the preliminary design has been found, adjustments are made to bring the tail of the curve under control, partly by modifying the values of the A's slightly, and partly by contracting the time scale to bring the part of the tail which remains appreciable within the allowable settling time limits. This leads to the somewhat lopsided match to the parabola shown in Figure 2. Figure 2. Approximation to parabolic weighting function, showing better settling characteristic. A method of bringing the tail of the curve under control" is to minimize the expression where /{W a (t)] 2 d! = 2£ C,„A,A, (5) -<.,+« m )r ai + am under the restrictions (2) and all but the last of (3). The f-methocj used by Phillips and Weiss is based on a 3-term approximation of the form (1) in which one a is real while the other two may be conjugate complex. The a's are not prescribed, so that there are six parameters to be determined. Four restrictions are imposed, viz., the first of (2), the first of (3), a restric- tion on the value of the tail area, viz., -.r W.(t)dt = ZAL£_L, 't '- 1 a t and the cross-over condition W a (T) = 0. Finally, the transmitted noise power, which, under the assumption of flat random noise as- sociated with the position data, takes the form (see Section 10.4) r [W.(t))t di is minimized with respect to the two remaining parameters by numerical methods. " Used by R. F. Wick. CONFIDENTIAL — — />• METHODS -*- 119 12.2 p-METHODS Three p-methods have been used. These will be described in chronological order. The first p-method is one which was used by R. L. Dietzold in exploiting the use of feedback amplifiers to secure the advantages of approxi- mations with complex exponentials. The trans- mission function Y(p) corresponding to the best weighting function W(t) is first formu- lated. The loss characteristic, -20 log,„ \ Y(im) |, is next computed and plotted against the fre- quency on a logarithmic scale. Then standard equalizer design techniques are employed to ap- proximate the loss characteristic, keeping in mind that the transmission loss in the feedback network of a feedback amplifier becomes a transmission gain for the circuit as a whole (14) of Chapter 11, we get J/o (p) = Vi(p) = 2 + p 12 y*(p) 12 + 6p + p» 120 (6) The second p-method is merely a more com- plete analytic formulation of the first, thereby avoiding the necessity for employing equalizer design techniques. It depends upon the possi- bility of expressing the transmission function corresponding to the best weighting function, in the form of equation (13) Chapter 11, which is associated with the symmetry of the weight- ing function, as shown in Section A.7. The method is based upon the determination of the envelope of the Q-function. The Q-function is first differentiated in order to obtain the equation which determines the values of « at which the maxima and minima occur. This transcendental equation is not solved but is used to eliminate the trigonometric functions in the expression of the Q-function. The result- ing expression, which is an irrational function of «o 2 , is then squared in order to make it a rational function of »>. The substitution p* = - o. 2 is made and the expression is then re- solved into two factors of which one contains all the poles with negative real parts while the other contains all the poles with positive real parts, the two factors being conjugate complex when p = to>. The first factor is then taken as an approximation of the desired transmission function. Applying the method to the desired transmission functions defined by (13) and 120 + 60p + 12p* + p» • This last is the basis for the design of a posi- tion and rate smoothing circuit for a proposed computor for controlling bombers from the ground." 11 This design is described briefly in Chapter 13. The third p-method is based upon the ascend- ing power-series expansion of the transmission function corresponding to the best weighting function. Examples of such power series are given by (15) of Chapter 11. The method of approximation is one which is credited to Pade in 0. Perron's "Kettenbruchen."" If the discus- sion in Section A.8 is referred to, it will be seen to be also a method of moments. The method consists in determining the co- efficients in a rational function of the form 1 + QiP + Qip» + j- a m p m 1 + b lP + 6,p» + . . . + 6„p» w so that the ascending power-series expansion of the rational function will agree with that of the best transmission function, term for term up to and including p m **. If the series for the best transmission function is 1 + cp + c,p* + . . . + c«+„p»+" + . . . (8) the equations which determine the coefficients in (7) are obtained by equating coefficients of corresponding powers of p, up to and including the (m + n)th, in (1 + b lV + and + fe.p") (l + c,p + • • • +c-+.p" + ") 1 + <HP + • • • + a n p m . The last n equations will be homogeneous in the 6's and c's. It has been expedient in some cases to omit the last few of the (m+n) equations in order to have some control over the number of real roots and poles and the number of conjugate pairs of complex roots and poles in the result- ing rational function. In the assumed rational expression (7) the CONFIDENTIAL 120 PHYSICAL REALIZATION OF DATA-SMOOTHING FUNCTIONS difference n — m "Should be chosen so that the ultimate slope of the loss characteristic will be the same as for the best transmission function. According to the theorem in Section A.8, if W(t) behaves like if as t->0, we should take n — m = r + 1. As a matter of experience the rational expression has invariably turned out to be physically realizable whenever this "rule" was followed. Frequently, however, the rational expression has turned out to be physically realizable under small departures from the rule. Examples of this method are given in Chap- ter 13. USE OF FEEDBACK AMPLIFIERS AND SERVOMECHANISMS In this section we shall describe the use of feedback amplifiers and servomechanisms to obtain desired transmission functions. For com- plete discussions of the most recent technical advances in the analysis and design of feedback amplifiers and servomechanisms the reader should consult some of the modern literature on these subjects. 2 3 - 51sl61T Let us assume that we have two networks whose transmission functions are Y t (p) and Y 2 (p), respectively, as shown in Figure 3. For Y 2 (P) ^>V(t) I £ (t) = Y 2 (p)-V(t) itic representation of networks ick circuit application. a signal E(t) applied to the first network the short-circuit output current is /,(£) = Y x (p)' E(t). For a signal V(t) applied to the second network the short-circuit output current is 1 Vi 2 Figure 4. First step in combining networks. hit) = 7, (p) -7(0- With the networks sharing a common short-circuiting conductor as shown in Figure 4, the current through the conductor is 7, -I- I 2 . If the source which develops the volt- age V(t) across the input terminals of the second network were in fact under the control of the current through the conductor, as shown schematically in Figure 5, in such a manner Figure 5. Output voitage controlled by short- circuit current across intermediate terminals. that it had to develop that voltage V(t) which reduces the current in the conductor to zero, then Yxip) E(t) + Y t (p) ■ V(t) = . Hence, the transmission function (now a volt- age-voltage ratio) of the arrangement shown in Figure 5 must be Yi(p) Y(p) = - (9) Y,(p) ' This relationship provides a method of ob- taining transmission functions with complex poles without the requirement of coils. The complex roots of Y(p), must be assigned to the numerator of Y 1 (p) , and the complex poles of Y(p) to the numerator of Y t (p). Aside from this, the other roots and poles of Y(p) may be assigned in any way which is favorable to good design practice. Redundant factors may be in- troduced if they are desirable, as is done in the examples described in Sections 13.1.5 and 13.3. The source of the voltage V(t) in Figure 5 does not' have to be controlled by the current through the short-circuiting conductor. Since the current through any short circuit must be zero if the voltage across the short-circuited terminals is zero before the short circuit is con- nected across them, the source of the voltage V(t) may just as well be controlled by the open-circuit voltage, as shown in Figure 6. It is clear that the source of the voltage V(t) is ideally an infinite gain amplifier. It is not nec- essary, however, that the amplifier have ideally unilateral transmission and infinite input and output impedances, since departures from these ideal characteristics may be compensated for in the design of the feedback network. The simple result expressed by (9) may be readily modified to take account of the finite This observation was first made by R. L. Dietzold. CONFIDENTIAL DESICN OF RC NETWORKS 121 gain of a physical amplifier. The modification will be expressed as an extra factor which corresponds to the "rf effect" or "nfi error" lie commonly encountered in the theory and design of feedback amplifiers. ■C 7T Figure 6. Output voltage controlled by open- circuit voltage across intermediate terminals. The exact transmission function of the cir- cuit shown in Figure 6 is most simply ex- pressed in terms of the following quantities: = current through a short across ter- minal-pair No. 3, per unit emf applied across terminal-pair No. t. Y 2 (p) = current through a short across ter- minal-pair No. 3, per unit emf applied across terminal-pair No. 2. Z 2 (p) = impedance between terminal-pair No. 2, with terminal-pair No. 3 shorted. Z 3 (p) = impedance between terminal-pair No. 3, with amplifier dead, terminal-pair No. 1 shorted, and terminal-pair No. 2 open. G(p) =transadmittance of amplifier. Then i - i (10) The quantity GYJZ„Z 3 is the of the circuit. The quantity Y,Y,Z„Z 3 to which Y reduces when G = represents the direct transmission of the circuit. The active impedance across terminal-pair No. 2 is Zip (ID Z tA 1 — Gi 2Z2Z3 where z iP = z t {\ + r|?,z,) . (12) Z tP is the passive impedance across terminal- pair No. 2. It differs from Z„ in that terminal- pair No. 3 is open. The exact expression (10) of the transmis- sion function is useful chiefly as a check on the simpler but approximate expression (9). It is in general quite practicable to make the trans- admittance or transconductance G of the am- plifier large enough so that the n0 effect may be neglected. In accordance with the sense in which the term "servomechanism" is used by MacColl, 4 a feedback circuit, such as that shown in Fig- ure 6, is a servomechanism — more specifically, an electronic servomechanism — since it oper- ates on the ideal principle of maintaining zero voltage across the terminal-pair No. 3. An electromechanical counterpart of the circuit shown in Figure 6 is shown in Figure 7. These 2- PHASE INDUCTION MODULATOR MOTOR : 7. Electromechanical counterpart of feed-' back amplifier circuit resulting in servomechaniMti. circuits assume that the signal E(t) is a modu- lated d-c carrier. If the signal is a modulated a-c carrier, "shaping" cannot be done conveniently by elec- trical networks. The difficulty may be avoided by various special devices. An example is de- scribed and illustrated in Section 13.4. 12.4 DESIGN OF RC NETWORKS In this section we will describe and illustrate two general methods of designing RC networks. The first is most useful when the transmission function is finite and not zero at zero fre- quency; the second, when the transmission CONFIDENTIAL 122 PHYSICAL REALIZATION OF DATA-SMOOTHING FUNCTIONS function is zero at zero frequency. The case of a transmission function with a pole at zero fre- quency will not be considered, since it is cov- ered by the methods , described in the preceding section, in conjunction with the methods de- scribed below. Let Y(p) Op + QiP + ••• + Q.+iP"* 1 (flo>0) (13) 1 + 6iP + • ■ • + 6»p" with simple, real, negative poles. Dividing by p, expanding into partial fractions and multi- plying through by p, we get On V + «1 P + «» \p + Mi P + fit •) + ) where the A's, B's, ats and 0"s are positive real quantities. The first term must be associated with those in the first parentheses if a n+l > 0, with those in the second parentheses if a n+ , < 0. The transmission function is now in the form Y(P)=YAP)-Y B (P) (14) where Y A (p) and Y B (p) are physically real- izable driving-point admittances of RC type. Each term of the form pA/ (p + a) is the admit- tance of the two-terminal, two-element network a ..a s — wwv — 1| — Figure 8. Simple RC network. shown in Figure 8. Each term in (14) there- fore represents a parallel combination of two- element networks of the type shown in Figure 8 and a conductance a in the case of Y A (p), PHASE INVERTER SUMMING AMPLIFIER Figure 9. Method of realizing RC transmission functions, requiring phase inverter. and a capacitance |On n |/b„ in the case of either YAP) or Y B (p). By well-known methods these two-terminal networks may be transformed into a variety of other configurations. The transmission function (14) may be real- ized in the arrangement shown in Figure 9 or in that shown in Figure 10. The latter is a lattice network which is suitable only in a LINE BRANCH I = (Y A -Y B ).E Figure 10. Lattice prototype for passive net- works with RC transmission characteristics. balanced-to-ground circuit. To obtain an un- balanced passive equivalent of this network we may resort to steps which will be described later in this section. The second general method of designing RC networks is most useful when Y(r>) = r> a ° + a 'P + • ■ + q "P" KV) P 1 + b lV + ••• + 6.p- («o > 0) (15) with simple, real, negative poles. Now, if the lattice in Figure 10 were driven from an in- finite-impedance source of current /„, the out- put current would be 1 - / = I* Ya Y h ' 1 t7~ If, furthermore, Is Ya then P »+! p (16) Taking it for granted for the moment that the lattice can be transformed as shown schemat- ically in Figure 11, we may then discard the condenser across the output terminals and, by Thevenin's theorem, 1 " we may replace the condenser across the input terminals and the infinite-impedance current source by a series condenser and a zero-impedance voltage source. The result is shown in Figure 12. Since CONFIDENTIAL desk;* of rc networks 123 V F. I, - pC E we now have 7 = ( " k which ia the desired result, to a constant factor. The factor k should in general be taken as small as possible subject to the requirement that all the roots and poles of (16) be simple, Figure 11. Step in transformation of networks with zero transmission at zero frequency. real, and negative. It can always be taken large enough to fulfill this requirement. A suitable value may be easily chosen by inspection of a plot of Y (p) fp for negative real values of p. Figure 12. Final step in transformation of net- works with zero transmission at zero frequency. The numerator and denominator of (16) are of equal degree and therefore contain the same number of linear factors. These factors may be assigned to Y A or to Y B arbitrarily except that Y A and Y F must be physically realizable driv- ing-point admittance functions which behave ultimately like condensers as the frequency in- creases indefinitely; that is, roots and poles must alternate and there must be a simple pole at infinity. There are five kinds of steps which may be taken to transform a lattice into an unbalanced form. These steps are based upon Bartlett's bisection theorem, 14 and may be taken in any order and as often as necessary. Each of them will now be described as it would be applied directly to Figure 10. In the following diagrams a lattice enclosed in a rectangle means an un- balanced network whose configuration may not be known yet, but whose lattice prototype is as indicated. 1. Shunt network pulled out of both branches : shown in Figure 13. 2. Shunt network pulled out of the line branch only: shown in Figure 14. 3. Series network pulled out of both branches : shown in Figure 15.° 4. Series network pulled out of the lattice branch only : shown in Figure 16. c Figure lii. Step in transiormauon oi lattice; shunt networks pulled out of both branches. Figure 14. Step in transformation of lattice; shunt network pulled out of line branch only. Figure 15. Step in transformation of lattice; series networks pulled out of both branches. i ■ i ft Figure 16. Step in transformation of lattice; series network pulled out of lattice branch only. * Given in impedance form. CONFIDENTIAL 124 PHYSICAL REALIZATION OF DATA-SMOOTHING FUNCTIONS 5. Breakdown into parallel lattices: a fairly obvious step which need not be illustrated. As an example of (13) consider I(P) l+b lP where all the coefficients are positive. Since y(p) = P£} -f- a - Oil. ~ ° lbl + ff ») p there is no problem if a, > (a,/^) + a^^ But if Ox < (aj/6,) + a 6 x we have the problem of trans- v — 5 — Figure 17. Illustrative lattice prototype. forming the lattice in Figure 17. We can apply steps 2 and 4 immediately, but find that the residual lattice cannot be transformed unless a, > {ajb,). Under this additional restriction we can apply step 8 obtaining finally the net- work shown in Figure 18. As an example of (15) consider Taking k = 1 (the smallest value which may be assigned) , we get Yb m 2p(3 + 16p) (1 + 2p) (1 + One way of choosing Y A and Y B is Y (1 + 2p) (1 + 16p) A 2(3 + 16p) This leads finally to the network shown in Fig- ure 19. Such a simple network is possible of Y B = p. course because F(p) happens to satisfy the re- quirements of a physically realizable driving- point admittance function. However, another way of choosing Y A and Y B is Y A l_±_2p Y p(3 -I- 16p) 2 * " 1 + 16p This leads to the network shown in Figure 20. II Figure 18. Unbalanced equivalent of illustrative lattice prototype when 02/61 <oi< (a 2 /6i) + 006!. Ro=l2 ) — wv\a — 1| — = 44 r = — 1 5 c « 9 Figure ltf. KC' network with zero transmission at aero frequency. C =l Ro=2 -AAAAAr R =2 ■AAAAAr 1 R,= 3 :C,=4 Figure 20. Another /2C network with zero trans- mission at zero frequency. CONFIDENTIAL Chapter 13 ILLUSTRATIVE DESIGNS AND PERFORMANCE ANALYSIS rpHE ILLUSTRATIVE material described in this J- chapter is taken from four practical appli- cations. 1. Second-derivative circuit for the M9 anti- aircraft director. 2. Position data smoother for the "close sup- port plotting board," with delay correction for constant velocity aircraft. 3. Position and rate circuit for the "com- puter for controlling bombers from the ground," with optional delay correction of posi- tion data for constant-velocity aircraft. 4. Position and rate circuit using electro- mechanical servomeeha.'Msms. The design and analytical procedure used in the first application has not heretofore been described in writing. Hence, considerably more space will be devoted to it than to the other three applications. The latter have been de- scribed in detail in reports. 1 " 1; 13 ls 1 SECOND-DERIVATIVE CIRCUIT DESIGN ,, M Realizable Approximation of Best Transmission Function The best transmission function for the sec- ond-derivative circuit was taken to be JVp) = p%(p) , in the notation of Chapter 11. This assumes fiat random noise in position data and, arbitrarily, 1-second smoothing and settling time. The series expansion of y.,(p) is, according to ex- pressions (15) of Chapter 11, yf( p,-i -Ip + ip.. JLp. + jl-p*...,. The form of the rational approximation, yip) = 1 + 6,p + b 2 p* + b 3 p 3 + b<p 4 ' was chosen for simplicity under the require- ment that the transmission function p*y(p) should cut off at the rate of 12 db per octave." This requirement was set as a precaution against noise due to granularity of the coordi- nate-conversion potentiometers in the director. Following the procedure outlined in Section 12.2 the following equations were obtained : !>i — 2 = b< -\b i + lb t -± b 1 + 1 ^ 1 h - 3 h 1 2' J 28' 1 ~ 53 84' whence Since p* + 21p J + 189p* -(- 882p + 1764 21 + V21 1 1764 - ip» + P + 42) x rp« + 21 -y^ p + 42) , 2 yAv) would have two conjugate pairs of com- plex poles, viz., p = - 6.40 ± il.047, - 4.10 ± t6.02, of which one pair is very nearly real. In order to simplify the circuit design, how- ever, it was desirable to limit the number of complex poles to a single conjugate pair. This was accomplished by leaving b 4 arbitrary so that the denominator of y 2 (p) was 1 + 5 p + k p,+ 8l p, + bipt • A value for b t which would make this expres- sion vanish at two negative real values of p was found by plotting 17646 4 - 5 (*» - Ox* + 42x - 84) ' The design antedated the formulation of the n — m = r + 1 rule given in Section 12.2, according to which the best transmission function should have been taken as p'y,(p) in the notation of Chapter 11. However, no trouble waa experienced in obtaining a physically real- izable approximation, of the complexity assumed. CONFIDENTIAL 125 126 ILLUSTRATIVE DESIGNS AND PERFORMANCE ANALYSIS against x, as shown in Figure 1. The right- hand member is positive only in the range x > 3.77 and has a maximum of 0.982 at about z = 6.63. 1.0 08 06 04 02 1764 b 4 i XJl 1.0 2.0 4.0 6.0 6.0 IO0 Figure 1. Graphical determination of 6«. In order to obtain a substantial separation between the two real poles of y 2 (p), the value 17646, = 0.5 was chosen. The approximation V(P) 1 3528 has poles at p - - 4.17391 , - 31.72813 , - 3.04898 * t 4.16463 . The series expansion of y., (p) agrees with that of V t (p) to four terms, the fifth term being 37/7056 p* instead of 5/1008 p\ The difference in the fifth term is less than 6 per cent. The realized approximation and the best weighting function are shown in Figure 3. is.u Transient Responses The responses of the physical network whose transmission function is p 2 y 2 (p) are compared to those of the best network whose transmis- sion function is p 2 y 2 (p), in Figures 2, 3, and 4. The signals for which (and the formulas by which) these responses were computed are tabulated below. Response formulas Realized Best L~Hm(p)\ 00/(1 -20(1 -/) L~ l \Vdv)\ mu\-t)\* Figure Signal / <0 I £0 2 1 3 t 4 o >f V /'(10- 15/ + 6/ 1 ) It has been noted that Figure 3 also repre- sents the best and the realized weighting func- tions. mauko u u it _II»T \ < h » • 1 » \ t \ « u V to \ 1 \ \ V* * t V 1M M V HB IM Mm 1 Figure 2. Responses to step function, viz., E (t) = 1 when t > 0. u u u <u \ A, ! . ICST w i KALIItO M % — t Figure 3. Responses to linear ramp function, vfz., E(t) - t when t > 0; second derivative smoothing functions. ~0~ Figure 4. Responses to parabolic ramp function, viz., E(t) = (%)£ when t > 0; second derivative settling characteristics. CONFIDENTIAL SECOND-DERIVATIVE CIRCUIT DESIGN 127 If a signal of the form Eif) = a t + a J + -., (hfi were to be applied suddenly to the second -de- rivative circuit at t = the response would be r '-; ! (;)-•;•< (?)+*.•<■(?) where A,„ A,, A . stand for the responses shown in Figures 2, 3, and 4, respectively, and where t is the time in seconds and T is the nominal smoothing time. The response V(t) is the indi- cated acceleration of the target. The sudden application of the instantaneous position and velocity components of the signal to the second-derivative circuit will give rise to some very serious consequences unless special measures are taken to mitigate them. To see this let it be assumed that T = 20 seconds and that the target is at such a range that a„ = 20,000 yards when the signal E (t) is applied to the second-derivative circuit. Each unit of A in the ordinate scale of Figure 2 then repre- sents an indicated acceleration of 50 yd per sec-. Referring to Figure 2 it is clear not only that the effective settling time will be several times the smoothing time but also that the indi- cated acceleration will go through exceedingly large maxima. Exceedingly large transient responses are not peculiar to second-derivative circuits. They occur also in first-derivative circuits in linear prediction, where they are due entirely to the initial position term in the signal. In all cases they are reduced to harmless proportions by special arrangements of the circuits during the operation of slewing. tion Y s of the experimental second-derivative circuit design, also referred to a nominal smoothing time of 1 second. The transmission function of the linear prediction circuit with 10-second smoothing of first derivative is then :— JTTT Table 1* »/ . - Yi Y, 1 0.174 i 0.666 —0.454 i 0.165 2 0.651 1.166 —1.442 1.212 3 1.312 1.358 — 2.014 3 527 4 1.943 1.203 —1.069 6.688 5 2.382 0.821 2.000 9.409 6 2.599 0.364 6.575 10.115 7 2.637 -0.067 10.893 8.220 8 2.558 —0.429 13.468 4.695 2.416 —0.711 14.096 0.953 10 2.242 —0.920 13.401 — 2.092 11 2.062 —1.070 12.064 — 4.320 12 1.885 —1.172 10.530 — 5.777 13 1.720 -1.238 9.027 —6.704 14 1.566 -1.279 7.652 -7.169 15 1.429 -1.299 6.438 -7.398 lb 5.382 -7.446 17 4.471 -7.374 18 1.096 -1.286 3.683 -7.221 19 1.004 -1.268 3.015 -7.025 20 0.926 -1.247 2.436 -6.795 22 0.790 -1.198 1.509 -6.292 24 0.683 -1.145 0.818 -5.780 26 0.593 -1.091 0.301 -5.287 28 0.518 -1.040 0.088 -4.828 30 0.457 -0.380 -4.402 32 0.407 -0.945 -0.599 -4.016 34 0.364 -0.902 -0.762 -3.666 36 0.326 -0.862 -0.881 -3.348 38 0.296 -0.825 -0.967 -3.062 40 0.266 -0.790 -1.026 -2.800 • f is in c when smoothing time T = 1 sec. For T-second net- works. values of 9/ are multiples of 1/9T e, values of Y t should bo divided by T, and values of Y t should be divided by T». The lwo networks may have different values of 7*. 13.1.3 Effect of Tracking Errors on while that of the quadratic prediction circuit Accuracy of Prediction with 20-second smoothing of second derivative The statistical effect of tracking errors on 1S the accuracy of prediction is most readily de- termined from the power spectrum of the tracking errors and the transmission function of the prediction circuit. Table 1 gives the values of the transmission function F, of the first-derivative circuit in the M9 director, referred to a nominal smoothing time of 1 second,' 1 and the transmission func- >V/0 (0.9- 9494_ K.077 31 74 1.6 V + 2.4 /. -r :Ui 27 01 \ v + ah) Y,(P) - JVp) + r»(20p) i G 2 are determined in accordance with the discussion in Section A.10. Since we get )',(p) = p(l - 0.3724p + )-,<p) = p 2 (l -•••) , 0', = // ft - I </ + 3.7241, . ) CONFIDENTIAL 128 ILLUSTRATIVE DESIGNS AND PERFORMANCE ANALYSIS Table 2 gives the values of \Yi(p) | J and of \Y q (p) \* for t t = 5, 10, 15, 20 seconds. These are plotted in Figures 5, 6, 7, and 8. of the total power, or an rms error of 15.8 yards out of 17.9 yards. The rms error of prediction is the square root of the power transmitted by the prediction circuit. This is tabulated on the last line of Table 2 and in the smaller table following. Figure 5. Power transmission ratio of linear and quadratic prediction circuits with 5-second prediction time. The last column of Table 2 and Figure 9 give the power spectrum of a composite of the range and transverse errors in a typical run The power contained in the frequency range covered by the table accounts for 78 per cent 40 rawt* THANsyiuiON «atio V-IOMC JO a -quad nta 20 IS 10 3 »0f 4 1 i u » II 20 Figure 6. Power transmission ratio of linear and quadratic prediction circuits with 10-second pre- diction time. Table 2 10 90/ IFil* |T f f* ! Y,\* I 1.00 1.00 1.00 1.00 1 1.29 1.13 1.82 1.60 2 2.10 2.76 4.08 8.90 3 3.20 6.85 7.19 26.73 4 4.2 10.0 10.1 39.5 5 5.0 10.5 12.1 39.9 6 5.3 9.8 13.1 35.6 7 5.4 8.8 13.2 30.8 8 5.2 7.9 12.8 26.6 9 5.0 7.1 12.2 23.0 10 4.7 6.3 11.4 20.0 11 4.4 5.7 10.5 17.5 12 4.1 5.1 9.7 15.3 13 3.8 4.6 8.9 13.5 14 3.6 4.2 8.2 12.1 16 3.4 3.8 7.6 10.6 16 3.2 3.5 7.0 9.5 17 3.0 3.2 6.5 8.5 18 2.8 3.0 0.0 7.7 19 2.7 2.8 5.6 7.0 20 2.5 2.6 5.3 6.3 rms error of prediction 23.9 29.5 33 9 53.4 15 20 IK.!* \Y,l* P* Mk-vn 1.00 1.00 1.00 1.00 31.4 2.59 2.71 3.59 4.81 33.5 6.97 23.16 10.74 50.35 35.7 12.96 72.51 20.51 159.43 19.7 18.6 106.1 29.76 231.3 3.6 22.4 104.4 35.9 223.9 2.5 24.3 90.6 38.9 190.6 1.2 24.6 76.6 39.4 158.4 1.6 23.8 64.7 38.2 131.8 2.1 22.5 55.0 36.0 110.6 1.4 21.0 47.0 33.5 93.5 0.7 19.3 40.4 30.8 79.6 0.8 17.7 35.0 28.3 68.2 0.8 16.3 30.4 25.8 58.9 0.5 14.9 27.1 23.6 52.0 0.3 13.7 23.4 21.6 44.5 0.8 12.6 20.6 19.8 39.0 1.1 11.6 18.3 18.2 34.4 0.8 10.7 16.3 16.8 30.4 0.4 0.7 9.9 14.6 15.5 27.0 9.2 13.1 14.4 24.1 1.0 44.5 85.4 55.4 125.0 • P U in uniu of 180 yd" per c CONFIDENTIAL SECOND-DERIVATIVE CIRCUIT DESIGN 129 Time of flight in seconds 5 10 15 20 Rms error of prediction due to tracking errors in yards Linear Quadratic 23.9 33.9 44.5 55.4 29.5 53.4 85.4 125.0 It is obviously relatively disadvantageous to use quadratic prediction when the target is in fact flying a rectilinear unaccelerated course. Figure 7. Power transmission ratio of linear and quadratic prediction circuits with 15-second prediction time. 1 1 POWER TRANSMISSION RATIO X,'10XC 2M MO QUAD MED IM 00 41 UN preo * 1 1 i J 1 J — I i r • 1 2o Figure 8. Power transmission ratio of linear and quadratic prediction circuits with 20-second pre- diction time. The relative advantage of linear prediction should persist for target paths with only a slight amount of curvature, but this relative advantage should decrease as the curvature is increased. When the curvature exceeds a cer- tain amount, the relative advantage should shift to quadratic prediction. The determination of the minimum value of target path curvature at which quadratic pre- diction becomes relatively advantageous de- pends not only upon: 1. dispersion of the predicted point of im- pact due to tracking errors, but also upon a number of i which are : 2. actual future position of target with respect to the predicted point of impact, assum- ing an accurate computer and the absence of all sources of dispersion enumerated here ; e 3. dispersion due to inaccuracies in the com- puter and data-transmission systems ; 4. dispersion due to noise in the computer and data-transmission systems ; 5. dispersion due to variations in actual dead time; 6. dispersion due to gun wear and to varia- tions in powder charge, shell weight, shell shape, etc.; ■J* s POWER SPECTRUM or TRACRM8 ERRORS MARK VII ROMS AS A 14 s i it. e m ' i i 1 1 r - " 1 1 it 1 * " — fi — =ft — it Figure 9. Composite power spectrum of tracking* errors of experimental radar. 7. dispersion due to variations in meteoro- logical conditions along the path of the shell ; 8. dispersion due to variability of time-fuze calibration ; and 9. lethal pattern of shell burst. In a special illustrative case, a numerical analysis, including most of these factors (esti- mated), showed that quadratic prediction be- comes relatively advantageous when the target acceleration exceeds about O.lg. However, this should not be taken as a general result. o This is considered in detail in the next section. CONFIDENTIAL 130 ILLUSTRATIVE DESIGNS AND PERFORMANCE ANALYSIS 1,1 * Linear and Quadratic Prediction Errors on Constant-Velocity Circular Courses The use of a finite number of derivatives of the tracking data for purposes of prediction is itself a source of prediction errors even if there were no tracking errors. Definite evaluation of these prediction errors can be made only if the path of the target is prescribed. The simplest path which can be prescribed for this purpose is a circular one at constant velocity. Such a path is fairly realistic when considered in rela- tion to the difficulty of maneuvering a bomber and to actual records of the paths of hostile bombers over London during World War II. The position of a target flying in a circle at constant velocity, referred to the center of the circle, is expressed by the complex quantity Re** where R is the radius of the circle and « is the angular rate. In terms of the velocity V and the transverse acceleration A, we have R = V*/A w = A/V. The predicted position is then at JtT(i»)e'-' where Y(u.) is the trans- mission function of the prediction circuit. The true future position of the target, however, is at R exp [i«>(t + t,) ]. Hence, the prediction error, referred to axes fixed on the target and oriented respectively transverse to and in the direction of the present velocity, is « ~ RlY(iu) - e"r] . As an illustration let us consider a case in which V = 150 yd per sec, A = 5 yd per sec 1 and t f = 10. For the linear prediction circuit Yrffo) - 1.0409 + /0.3296 and for the quadratic prediction circuit r,(»«) - 0.9501 + t0.3610 while - 0.9450 + t0.3272 . Hence, when the present position of the target is at 4500 + t'O with respect to the center of the circle, the linear predicted point is at 4684 + tl483, the quadratic predicted point is at 4276 -I- t'1624 while the true future position is at 4252 + t'1472. These are shown in Figure 10. The prediction error vectors are «, = 432 + /ll j t| ; = 432 « t = 24 + f 152 |«v = 154 Referring to Figure 10 it may be observed that if the first-derivative component of the prediction were to be reduced by approximately 10 per cent a nearly perfect hit would be ob- tained. This suggests the possibility of deter- 2000 - < QUA0RAT IC PREDICTED , POSITION SECOND DERIVATIVE TRUE FUTURE POSITION — ^ (10 SEC) ^ LC Ml TUTOR — tv LINEAR ^PREDICTED I ! i 1 jf POSIT BN 1 -» • NATIVE TOR woo - • 1 FIRST Kl LlEAO VE< 1 — • • i I 4M0 m TO 9 CCMTC* Or TURK 1 PRESENT POSrTMM Figure 10. Vector diagram of linear and quadratic prediction for constant-velocity circular courses. mining empirical functions of the time of flight for the potentiometer factors G, and G, in order to improve the probability of kill. This would involve consideration of all of the sources of dispersion enumerated in the preced- ing section as well as a statistical study of tar- get paths. Such a determination has not been attempted. it. i s Physical Configuration of the Second-Derivative Circuit In this section we shall derive a physical con- figuration for the second-derivative circuit. In particular it illustrates the application of feed- back to the realization of weighting functions or impulsive admittances involving complex exponentials in general." It should be pointed out, however, that the application of feedback to the end in view is not restricted to purely Originally proposed by R. L. Dietzold. CONFIDENTIAL CIRCUIT FOR CLOSE SUPPORT PLOTTING BOARD electronic circuits. An application involving the use of servomechanisms will be described in Section 13.4. The transmission function which concerns us here may be expressed in the partially factored form Y( P ) = ((> + 0.2087) i/> + l..)S04)(/;- + 0.3U4<»p + O.OttOli) where the |>oles have been adjusted to cor- respond to T = 20 seconds and where a constant factor has been left out. The circuit is to be designed to work out of the amplifier in the first-derivative circuit of the M9 director. Since this much of the first- derivative circuit has a transmission function of the form p (p-t-0.24), the transmission function which we have to realize is Y ,(p) / Y,(l>) where and P f 0.20S7' ip + i..W»4i Y,ip) U.MWp + IMKttWi p + 0.24 The inversion of the factor corresponding to Y,(p) is in accordance with the fact that the transmission gain through a feedback amplifier is equal to the loss in the feedback network, provided the feedback is very large. To realize the transmission function Y,(p) /Y,(p) it is therefore necessary only to realize the trans- it SMOOTHING NET WORK 1 — 1| — WVW^WV- »,C,= J.IM Ci = o.ai?oc, R,C, = J. 604 R,= 0.07UI R, = iz.n T-O-T R,/2 Figure 11. Physical configuration of quadratic prediction circuit for modified M9 AA director. mission functions Y { (p) and Y,(p) individu- ally. The corresponding networks are shown in Figure 11, with typical element values. The input network has four elements, whereas Y, (p) has only two parameters. Hence there are two degrees of freedom in the element values of this network. One degree of freedom must be reserved for the impedance level; the other permits some latitude in the relative values of the resistances and stiffnesses. The feedback network has four independent elements, whereas Y,(p) has three parameters. Hence there is only one degree of freedom in the element values of this network. This degree of freedom must be reserved for the impedance level. There is, however, one degree of freedom be- tween the impedance levels of the two net- works. This follows from the fact that the transmission function of the circuit is the ratio of the transmission functions of the individual networks. The scale factor for the transmission function of the circuit is readily determined from the fact that the transmission function must be approximately pR t ,C„ at small values of p. 13.2 CIRCUIT FOR CLOSE SUPPORT PLOTTING BOARD In this application, position data smoothing with delay correction for constant rates of change in position was required. Assuming flat random noise in position data, and, arbitrarily, 1-second smoothing time, the best transmission function for position data smoothing without delay correction is y u (v) in the notation of Section 11.3. The best transmission function for the first-derivative circuit, if it were re- quired, is py x (p) . Hence, the best transmission function for position data smoothing with full delay correction is = »o(p) + g P* l(p) • This corresponds to the weighting function Wi(t) = 14,(0 = 2(2-3/) < / < 1 . The series expansion for Y,(p) is, by (15) of Chapter 11, P 4 Yi(p) P J + £ _ JL- + 12 T 30 120 T CONFIDENTIAL 132 ILLUSTRATIVE DESIGNS AND PERFORMANCE ANALYSIS The form of the rational approximation was chosen as ' W 1 .+ b lP + 6 2 p l + b,p* in order to obtain a loss characteristic which has an ultimate slope of 12 db per octave.* This requirement was also set as a precaution against noise due to granularity of the coordi- nate-conversion potentiometers. The coefficients are determined by 13.3 i fci = ai -n> = ° + ™ 30 6, -V2 b > + 3o fel - lib = ° whence Y(p) = 1 + Hf + If' + 1440 This may be expressed in the form Y(p) YAp)/Y,(p) where 1 7<(p) = 1 -(- 0.1053p „ , , 1 + 0.3530p + 0.0461 5p' w) - — The circuit Figure 12. 1 + 0.4583p ion is shown below in R./2 "•/* -VWWAVW =!=C, R,CV0.4?I3 R,C, =0.1007 R, = 0J06IR, 8,^=0.8241 Figure 12. Physical configuration of data-smooth- ing circuit for close support plotting board. • This design also antedated the formulation of the n — m = r + 1 rule given in Section 12.2 according to which we should have taken Yi(p) « y,(p) + % pyAp) ■ CIRCUIT FOR GROUND-CONTROL BOMBING COMPUTER In this application, rate smoothing as well as position smoothing was required. In addition, delay correction in position, for constant rate of change, was to be available but optional, and the loss characteristic was to have an ultimate slope of 12 db per octave, or more. In accordance with the n — m = r + 1 rule, the best transmission function for position data is y 1 (p) , whereas that for rate is pi/ : (p) . A num- ber of designs were made on this basis. How- ever, from the point of view of network econ- omy they were inferior to a design based on j/ 2 (p) for position data. The use of 2/ 2 (p) for position data is not consistent, theoretically, with the use of pi/ 2 (p) for rate, but the practi- cal advantage outweighs the theoretical disad- vantage. The rational approximation used for i/,(p) 4r MR, 0JR, l — WW-r^VWV— 1 r CJR, r *. HI- R,C, = 0.4431 r,c, «ai*M R,C, -0.S000 R,C. * HUM R,C t « 0.13*0 ALTtBNATIVCLV (FOR DELAY CORRECTION) 0.2153 (FOR FIRST DERIVATIVE) 0.2 i5J (FOR DELAY CORRECTION) Figure 13. Physical configuration of linear pre- diction circuit for ground-control bombing com- puter. is the one given in (6), Section 12.2. It may be expressed as where YAP) Y,(P) Y»(p) 1 1 + 0.2153p 1 + 0.2847p + 0.03870p» 1 + 0.135<Jp 1 1 + 0.135*)p CONFIDENTIAL CIRCUIT I SING SERVOMECIIANISMS 133 It may be noted that a redundant factor has been introduced, viz., 1 + 0.1359p, in order to secure a physically realizable Y,(v) . The coeffi- cient was chosen so that a resistance would not be required in the shunt branch of the feedback network. Referring to tin- circuit configura- tion in Figure 13, the transmission function of the input network is Y, s (p), that of the feed- back network is Y,(p), and that of the output network at the top is Y, ,(p) . The output impedance of the amplifier is re- duced nearly to zero by virtue of shunt feed- back. 1 "^ Hence, the rate circuit, as shown in Figure 13, may be derived from the amplifier output through a simple additional network whose transmission function is pY,,(p)- Two rate outputs are provided so that the delay introduced in position may be corrected option- ally without disturbing scale factors. CIRCUIT USING SERVOMECHAN1SMS In the final report, October 25, 1945, to NDRC Division 7, on the research program car- ried on under Contract NDCrc-178, a list is given of a number of the more important prac- tical advantages for the use of a-c carrier in computing circuits. These advantages are: 1. Permits operation at lower levels before running into trouble with thermal noise, contact potentials, drifts due to temperature; 2. Permits use of transformers for imped- ance matching, voltage transformations, cou- pling between balanced and unbalanced circuits ; 3. Permits use of hybrid coils for voltage summations of moderate precision ; 4. Eliminates the necessity for modulators in servo circuits using a-c motors ; 5. Permits reduction in total power consump- tion, rectified power for amplifiers, and voltage regulation. However, the techniques of differentiation and of data smoothing with fixed networks in computing circuits which use d-c carrier, are not applicable to computing circuits which use a-c carrier. The circuit described here is an example of one of the techniques used in the T15-E1 experi- mental curved flight director.' In Figure 14 servo motors' are indicated by A/, and genera- ' The technique of using servo motor* for smoothing, as described above, is due chiefly to h L. Norton. tors by G. The motors are two-phase induction motors with one phase winding of each ener- gized directly by the carrier source at constant amplitude. The generators are essentially two- phase induction motors also with one phase winding of each energized directly by the carrier source at constant amplitude. They deliver, at Figure circuit. 14. Electromechanical linear prediction the other phase windings, carrier voltage at amplitudes proportional to the angular velocities 0, and 0, of the shafts. The potentiometers are energized by the carrier source at constant am- plitude. They deliver carrier voltage at ampli- tudes proportional to the angular positions 0, and 6. 2 of the shafts from some reference posi- tions. The position data are represented by the modulation amplitude E. With amplifiers of sufficiently large voltage gain and power capacity, and motors of suffi- ciently large torque, the operational equations of the circuit are readily found by equating to zero the sum of the voltages applied to each amplifier. Thus 0i + (a, + 0p)0, = E p0i - (1 + a 2 p)0, = whence 0i = u 2 = 1 + a t p l + + a„)p -(- 0p J E 1 -Mat + «s)p + /3p J The angular position l therefore represents the smoothed position data while the angular position 6 2 represents the smoothed rate. CONFIDENTIAL Chapter 14 VARIABLE AND NONLINEAR CIRCUITS The past discussion has been more or less clearly directed at predictor systems hav- ing certain well-defined properties. For ex- ample, it has been tacitly assumed that the first part of the prediction system will consist of geometrical manipulations transforming the raw input data into other quantities, such as the components of velocity in Cartesian or in- trinsic coordinates, which we have some physi- cal reason to believe should be approximately constant for extended periods." These quanti- ties, then, are isolated explicitly in the circuit and are the actual effective inputs of the data- smoothing networks. The data-smoothing net- works themselves are, of course, definitely assumed to be linear and invariable. This is obviously a straightforward attack but it does not necessarily exhaust all possibili- ties. For example, advantages may be gained by using data-smoothing networks which are nonlinear or which vary with time or target position. It may also be possible to smooth the input data according to some geometric as- sumption, such as straight line flight, without the necessity of isolating geometrical parame- ters explicitly. This chapter attempts to illustrate these pos- sibilities by some rather scattered examples. Data-smoothing networks which vary with time seem to give improved performance over fixed networks, and have been studied with some care. Several examples are given at the end of the chapter. None of the other lines, however, has been explored at all thoroughly. The ex- amples of data-smoothing networks variable with time are, in a sense, illustrations of non- linearity also, since they all operate on the assumption that the cycle of the network's variation with time begins anew at each marked change in course. Since a change in course is exactly like a tracking error, except that it is much larger, this resetting requires a nonlinear control circuit which respond to large amplitude effects but not to"small ones. 1 This is true ideally even in the Wiener system since Wiener assumes that transformations will be made to some suitable coordinate system, preferably the intrin- sic, before the statistical prediction method is applied. This, however, is evidently a very mild sort of nonlinearity. More thoroughgoing nonlineari- ties have not been studied. There seems to be no a priori reason for supposing that they would appreciably improve the performance of data-smoothing networks. The first part of the chapter gives examples of data-smoothing schemes which do not re- quire the isolation of geometrical parameters. They are based on degenerative feedback cir- cuits which satisfy the requisite formal rela- tions but which might, in some cases, be un- stable in practice. This portion of the material is included primarily for its possible sugges- tive value rather than for its concrete practical usefulness. >*•' THE PROTOTYPE FEEDBACK CIRCUIT The diversity of particular circuits can be givon a certain unity by regarding them all as modifications of the feedback smoothing cir- cuit shown originally in Figure 2 of Chapter 10. In accordance with the discussion of that figure it will be convenient to suppose that the resistive feedback path is introduced to limit the gain of the amplifier proper, so that the structure reduces to an amplifier with high but finite gain and a pure capacity feedback. The circuit has a net loop gain, and is consequently degenerative, at any moderately high frequency. For our present purposes, it is convenient to recall the general property of degenerative feedback amplifiers, that they tend to suppress any given frequency by the amount of the de- generative feedback for that frequency. This suppression obtains not only at the amplifier output but at many other points in the circuit as well. For example, it holds at the amplifier input if we combine the original applied volt- age with the voltage contributed by the feed- back 1 - circuit 1 ** Thus, except for the absolute b This follows immediately from the fact that, since the characteristics of the amplifier proper are not changed by the addition of the feedback path, the output voltage is always a fixed multiple of the net input voltage. 134 CONFIDENTIAL SIMULTANEOUS SMOOTHING IN THREE COORDINATES 135 signal level, it is not necessary to transmit through the amplifier of Figure 2 of Chapter 10 in order to produce the smoothing effect. It would be sufficient to hang the input circuit of the amplifier, as a two-terminal impedance, across the circuit. 142 SIMULTANEOUS SMOOTHING IN THREE COORDINATES The property of degenerative feedback cir- cuits which has just been described is con- veniently illustrated by a three-dimensional ex- tension of the original smoothing circuit of Figure 2 cf Chapter 10. The three-dimensional circuit is shown in Figure 1. The three input voltages are the quantities D, DE, and DA cos i 'WW I 20k win R r W\rt Vj-DE V,«DAm»E COORDINATE 1 COORDINATE CONVERTER CONVERTER 1 m ' MODULATORS f m • • mm m mm^ :demodulators: ..... I r Figure 1. Feedback smoothing in three coordinates E, where D, E, and A are, respectively, slant range, elevation, and azimuth. The three volt- ages will be recognized as the three components of the target motion in a tilted and rotating rectangular coordinate system. One axis of the tilted system is directed along the instan- taneous line of sight to the target and the other two are perpendicular to this one in the ver- tical and horizontal planes respectively. It is assumed that these input rates represent target motion in a straight line, plus the usual track- ing errors. The object of the smoothing system is to provide shunt impedances which will tend to suppress the tracking errors by feedback action, according to the principles described in the preceding section, without disturbing the portions of the input voltages corresponding to the assumed straight line path. We can simplify the analysis by restricting our attention to the special case of two-dimen- sional motion which occurs when the target course lies in a vertical plane passing directly through the antiaircraft position. This is illus- trated in Figure 2. In this case the component DA cos E is evidently zero. If we represent the voltage at the other two terminals, includ- ing both the original applied voltages and the voltages fed back through the circuit, by V, and V v the voltages coming out of the coordinate converter on the right-hand side in Figure 2 are v, « Vi cos E -V t sin E v w - V t cos E + V x sin E (1) These voltages are differentiated, passed through a second coordinate converter, and fed back so that the output voltages must satisfy (2) Vi = D — cos E + it sin E) V, = DE - cos E - v, sin E) . In order to exhibit the smoothing action of the circuit let us denote the observed velocity components, referred to the upright and fixed This is the coordinate system which was used in the experimental T15 director. A complete prediction cir- cuit can be obtained by using- the three voltages de- scribed here as inputs to the lead servos in the TIB system. In the actual T16 system, rates in the tilted and rotating coordinate system were obtained by the so-called "memory point" method. The voltages D, DE, -etc., required with the present method, might be ob- tained with the help of tachometers attached to the tracking shafts to measure the instantaneous values of D, E, and A. An equivalent to the variable smoothing of the memory point method can be obtained by *«»n«f the gains in the feedback paths in Figure 1 variable according to the principles described in a later CONFIDENTIAL 136 VARIABLE AND NONLINEAR CIRCUITS rectangular coordinate system, by u t and u w , so that u t = D cos E - DE sin E u„ = DE cos E + D sin E . Substituting (2) and (3) into (1), we get (3) Vy Uy — fiVy or Ml'* + = HVy + Vy = Uy . These show clearly that v x and v„ are smoothed values of u„ and u y , respectively. If n is constant the smoothing is of fixed exponential type. If ^ is proportional to the time up to some maxi- mum value, the smoothing is of the variable type described in Sections 14.6 and 14.7. To complete the discussion of the circuit we observe that by (1) Vi — r x cos E + v y sin E V t = Vy cos E — r« sin E . These show that V x and V, are the smoothed rate components referred to the tilted and rotating rectangular coordinate system. The fact that the orientation of this coordinate sys- tem, which depends upon the observed angular height E, is not smoothed makes no difference to the computation of the leads because this computation is made instantaneously in the same coordinate system to which the smoothed rate components are instantaneously referred. The analysis in the general case including all three coordinates is of the same nature. Since the rate components in fixed rectangular coordinates appear in the middle of the feed- back path, it is perhaps not fair to regard the circuit as an illustration of a data-smoothing device which does not rely upon the explicit isolation of the geometrical parameters of the assumed target path. It should be pointed out, however, that in comparison with a straight- forward geometrical solution in which velocity components in fixed coordinates are first isolated explicity, then smoothed, and then used to form the basis of prediction, the circuit in Figure 1 has the advantage that most of the components can be built with very low precision. What is transmitted around the feedback loop is essen- tially the tracking errors only. Since tracking errors are always small, very high percentage errors in the system can be tolerated.* COO CON RDINATE VERTER J-l ! MODULATORS ', c J COORDINATE I CONVERTER 'DEMODULATORS! ■Ir Figure 2. Feedback smoothing in two coordinates. SMOOTHING NETWORKS VARIABLE WITH TARGET POSITION It was mentioned earlier that changing the data-smoothing network with the target coor- dinates represented one way in which the re- sults obtained from fixed networks could be d An exception to this statement must be made for errors in the coordinate converters which fluctuate rapidly with target position. CONFIDENTIAL SMOOTHING NETWORKS VARIABLE WITH TARGET POSITION 137 generalized. In a sense, the coordinate conver- sions of Figure 1 are illustrations of these possibilities. A better illustration, howe.dr, is provided by the circuit of Figure 3. Thv struc- Figure 3 Feedback smoothing with smoothing variable v. ; h pv^iioti coordinates. ture is intends to give smooth slant range rate from slant range lata, under the assump- tion of unacceierated straight line target motion. The relation between input and output in Figure 3 is readily seen to be • '"at" -4 '»'•>] or M ^ (/)IJ + 1= ^ (4) where ^ is the amplifier gain, D is slant range, and V = dD/dt is slant range rate. The principle of the circuit depends upon the fact that under the assumed target motion the square of the slant range, D 2 , should be a quadratic function of time, so that [D (dD/dt)] should be a linear function of time and (d/dt) [D (dD/dt)] should be a constant. This last is the quantity which is fed back in Figure 3. If it actually is a constant, it has no further influence on the calculation, since the forward circuit includes a differentiator, and the opera- tion of the circuit is the same as though no feedback term were present. This can be verified by setting D = D = \/a + 2bt + ct\ corre- sponding to ideal straight line flight, in equa- tion (4). It is readily seen that the equation is satisfied by ft + <* dl) V = To = Va + 2bl -r Ct* (It the first or feedback term being zero. If D does not correspond exactly to straight line Alight, either because of tracking errors or actual target maneuvers, on the other hand, the feedback voltage is no longer constant. In this case transmission around the loop can exist and the degenerative feedback action produces smoothing in both the input and the output voltage. In calculating the exact effect we must take account of the fact that the feed- back voltage depends upon the D potentiometer in the feedback circuit as well as upon the out- put voltage V. Since the D potentiometer set- ting must include the errors in the input data, this means that the output voltage is not per- fectly smoothed, even with unlimited gain around the loop. The percentage error in the output rate tends in the limit to approximate the percentage error in D itself. For practical purposes, however, this is a very satisfactory result, since in the absence of smoothing per- centage errors in rates are usually many times those of the corresponding coordinates. It is apparent that it should be possible to construct many circuits of this general type from the differential equations of the trajec- tory. A second example is furnished by Figure 4. The operation of the circuit is essentially • • DAcosE _ •The condensers in Figure 3 symbolize differentia- tion. Figure 4. Another example of feedback smooth- ing with smoothing variable with position coordi- nates. similar to that of Figure 3. It depends upon the fact that in unaccelerated straight line motion the quantity D 2 A cos 2 £ is a constant. Instead of multiplying by D 2 and cos 2 £ at a single point in the feedback loop, however, separate multiplications by D and cos E are introduced in the forward and feedback cir- cuits. This permits the output to appear as a smoothed value of the quantity DA cos E, CONFIDENTIAL 138 which will be recalled as one of the primary quantities in the circuit of Figure 1. 14 -« NETWORKS VARIABLE WITH TIME In addition to making the parameters of the data-smoothing network vary as functions of the coordinates of target position we may also make them variable as functions of time. The advantage of variation with time can be under- stood by going back to the discussion of the analytic arc assumption and its consequences for fixed data-smoothing networks, as given in Chapters 9, 10, and 11. It will be recalled that for any given settling time there was an opti- mum choice of the network's weighting func- tion. The choice of the settling time itself, how- ever, was always a compromise. On the one hand, making the settling time too short led to too little smoothing, so that the dispersion in the resulting fire became excessive. On the other hand, too long a settling time meant that data from previous unrelated segments were retained in the smoothing circuit during too large a proportion of an average individual seg- ment of the target path, leaving too small a residue of the average segment as useful firing time. It is evident that it is theoretically possible to escape the consequences of this compromise by resorting to variable structures. We need merely assume that the network always has a weighting function appropriate for a settling time equal to the time since the last change in course. This would give a small amount of smoothing shortly after a change in course, with more smoothing and consequently greater accuracy later on. No firing time, however, is sacrificed waiting for the network to settle. In order to exploit these possibilities we must, of course, be able to design networks to give at least approximately the right sequence of weighting function. It is also necessary to provide some sort of auxiliary controlling mechanism which will sense changes in target course and return the variable circuits in the smoothing network proper to their initial posi- tions. These are both difficult problems which .iave been incompletely explored. Some elemen- tary solutions, based principally upon modifica- tions of the degenerative feedback smoothing circuit of Figure 2, of Chapter 10, are, how- ever, given later in the chapter. As a prelimi- nary, the next section gives a formal extension of the general polynomial expansion method of Chapter 11 to the variable case. »* s GENERAL POLYNOMIAL SOLUTION FOR VARIABLE NETWORKS The extension of the general method of Chapter 11 to the variable case requires two modifications. 1. The lower limit of the integral to be minimized is now taken as zero, in anticipation of the possibility of discriminating between rele- vant and irrelevant data on the basis of time of arrival. 2. The weighting function may now depend more generally upon the variable of integration and the upper limit of integration. With these modifications there is no longer any advantage, in conducting the analysis in terms of the age variable t. To deal directly with the minimization of the integral jf \E(\) - ig(X)}« B'o(/,X) rfX , (5) let E(\) = Vo + Vi- G,«,X) + • • • + V m • G n (t,\), (6) Where G m (t,k) is an mth degree polynomial in A. Also, let £ w (t,\) d\ = i jf G,(/,X) ■ G m (t,\) ■ W (t,\) d\ = if I * m " T. in = m (Go = 1, Ar = 1) . Then (5) is a minimum with respect to the V m 's in (6) if V m (t) =J^ l E(\)-W m (t,\)d\ (7) where W m (i,\) = k m G m (t,\) • W (t,\) . (8) The possibility of physically realizing the V m (t) depends upon the possibility of realizing networks with impulsive admittances W m (t^) in the sense that W m {t,k) is the response of a CONFIDENTIAL NETWORKS WITH A LIMITED RANGE OF VARIATION 139 network, at time t, to a unit impulse applied at time A, where < A < t. Taking this possibility for granted, the predicted value E(t + t,) is, according to (6), a variable linear of the V m {t), viz., Kit + t/) (9) Wit) + d(M + ii) ■ V v (i) + ■ + G n (t,t + y • v.(t). It is clear that all of the W m (t,\) as well as all of the G m (t,\) for m = 1, 2, . . . are deter- mined by W (t,\). The latter is determined as the best weighting function for position data smoothing, depending upon the characteristics of the noise associated with the position data. The general methods of determining the best weighting function with fixed smoothing time, described in Chapter 10, may be used to deter- mine the best weighting function with variable smoothing time. Under the assumption that the spectrum of the noise associated with the signal 5(0 has a uniform slope of 6k do per octave, we may take over from Section 11.3 the result that the best weighting function is -«- J W ! [i( l <-W (,0) £ X £ I . The response of the network is then £ S(X) • w k {t,\) rfX (ID SPECIAL It will be illuminating to consider a few special cases of (11). For k = 0, we have V(D = | jfs(X)dX. (12) Multiplying through by t and differentiating we get tV(t) + V(t) = 5(0 . (13) This suggests the circuit shown in Figure 5. f For k = 1, we have V(t) t* Jo S(X) • \(t - X) rfX . Multiplying through by t 3 and differentiating twice we get Irv + IV + V = S which may be written in the form This suggests the network shown in Figure 6.« 14.7 NETWORKS WITH A LIMITED RANGE OF VARIATION By generalizing the above results in various ways a large number of other examples of variable smoothing networks can be constructed. Since unlimited variation in the smoothing time is not practically possible, or perhaps even tactically optimal, however, it is desirable in discussing any further examples to include also the possibility that the range of variation in the network may be restricted. For any posi- tive integral value of k in (11) the differential equation for V(t) is of the type which may be reduced by the transformation t = e* to a linear differential equation with constant coefficients. 11 In general, this facilitates the determination of what happens to the weighting function w k (t,A) when t > T if the variability of the network is stopped at time T. In the case of the first-order equation (13), however, it is just as easy to deal directly in terms of the natural time. A more general form for (13), which readily yields the effects of a sudden or gradual stop- page of the variability of the network, is «(0 V(t) + V(t) = 5(0 (14) This corresponds to the response whence the weighting function is w(t,\) = »(X) *(0 (15) ' This circuit is due to S. Darlington. « Due to B. T. Weber. "See Section A.ll for a more, general transforma- tion. CONFIDENTIAL 140 VARIABLE AND NONLINEAR CIRCUITS The general relation (14) may be realized with the network of Figure 5, by varying the resistance in accordance with R m 1<K0 t > . However, a more practical circuit results from the introduction of variable potentiometers' in both the capacity and resistance paths of the C=4= V(t) Figure 5. Time-variable smoothing circuit giving uniform weighting function. original feedback smoothing circuit of Figure 2, Chapter 10. This is shown in Figure 7.' It may be noted that the feedback circuit is also applicable to the two cases discussed in the preceding section. It has the advantage for these applications that it does not require the zero-impedance generators and infinite-imped- ance loads of Figures 5 and 6. This example obviously calls for a linear poten- tiometer in the condenser path and a switch in the resistance path. The weighting function ob- tained is, by (15), u>(*,"X) - - < \ < t < T j, e-^/r o < X < T < t 1 e-«-wr < T < X < t Figure 7. Limited range time-variable feedback smoothing circuit. S(1)A C, D ,J_ C,=J= V(t) I Figure €. Time-variable smoothing circuit giv- ing parabolic weighting function. As an example of (14) we may take *(0 = t < t < T = re"-™ t > T . Then J(0 =/ 0<t<T = T t > T . Hence, in Figure 7, if RC = T fc(t) = j, fa(t) =0 <t < T = 1 = 1 t > T . 1 In aome cases a variable potentiometer may turn out to be a switch. J This circuit is due to S. Darlington. This is illustrated in Figure 8 for T= 10, t = 5, 10, 20. 0.2 0.1 t = 5 t = IO T=I0 t=20 10 15 20 Figure 8. First example of weighting function produced by circuit of Figure 7. A second example is furnished by taking <t>(t) = i k < t < T = 7*e*«-T>/T t > T . Then ko k < 1 < T T CONFIDENTIAL OTHER EXAMPLES 141 Hence in Figure 7, if RC T k. The weighting function obtained is, by (15), frit) = T fud) = 1 l k (i < i . T = 1 1 i > T wCt,\) = 2T The first example is a special case of this one. The weighting function obtained is, by (15), AX* -1 u»(/,x) = — -j— o < x < / < r ■ c -* ( '- r)/r o < x < t < / = ^ e -*('-M/r o < T < X < / . This is illustrated in Figure 9 for k - 3/2, 7 1 - 10, t = 5, 10, 20. < X < * < 7 27 1 2 7 xV e " 2l '" T) T < x < T < 1 V ~2f) e -2 ( i-y)/T < T < \ < t . This is illustrated in Figure 10 for T = 10, t = 5, 10, 20. k = i T=I0 Figure 9. Second example of weighting function produced by circuit of Figure 7. A third example is furnished by taking 2-1 < / < T TV *«-T) r , > 7' Figure 10. Third example of weighting function produced by circuit of Figure 7. A fourth example is furnished by taking 4><t) - c* - 1 < > . Then l 57, i>o. Hence, in Figure 7, if f?C = 1/k, fc(t) = /*(0 = 1 - e~ kt t> . The weighting function obtained is, by (15), k Then w(t,\) = 1 - e -kl e -*d-x) o < X < t <t>a) \ 2/7 For any value of t this weighting function is exponential in x. T 14.8 OTHER EXAMPLES Hence, in Figure 7, if RC - 7/2, /r(fl = |(l ^) /*(» = -,{. < / < T = 1 = 1 / > T . CONFIDENTIAL Because there has been no demand for varia- ble networks in the field of communications, the technique of designing practical variable networks is in a very rudimentary stage com- pared to that of designing fixed networks. In the remainder of this chapter we shall describe VARIABLE AND NONLINEAR CIRCUITS U2 some of the circuits which have been developed for specific practical applications. A memory point method of obtaining smoothed rates, based upon (12), is illustrated below. If S(t), the quantity to be smoothed, lepresents the time derivative E(t) of the posi- tion data E(t), then the average rate is given by Coder the assumption that the position data, aside from tracking errors, is a linear function of time, the average rate is also the smoothed rate. If the position data is represented by the angular displacement of a shaft in the com- puter, the quantity £"(0) is readily fixed by providing a second shaft which is coupled to the first shaft until t - when the coupling is broken. Potentiometers mounted on the shafts are energized by a voltage varying as a func- tion of time in the manner indicated in Figure 11. The manner in which the smoothed rate is obtained is clear Fibi'iit 11. Memory point method of obtaining smoothed rate. The memory point method of obtaining iuothed rates is used in the T15 antiaircraft director. 4 In this application, however, it is somewhat more complicated than in the simple illustration described above. This is due to the fact that the position data and the memory point are in the polar coordinate system, whereas the rate components are referred to a tilted and rotating rectangular coordinate system which is determined by the instanta- neous llllr of sight Figure 12, shows a way of securing variable smoothing in a purely electrical circuit * Except for the fact that the division of the current through the condensers is varied discontinu- FiGURE 12. Specific limited range time-variable feedback smoothing circuit. ously instead of continuously, this circuit cor- responds to the first or the second example dis- cussed in Section 14.7. Figure 13 shows the variable smoothing cir- cuit 1 for smoothing first derivatives in the M9A1-E1 antiaircraft director. 8 This circuit R Figure IS. Another specific limited range time- variable feedback smoothing circuit. corresponds approximately to the second exam- ple of the differential equation (14) given above. The variable element is a thermistor which is heated up to a high temperature, prac- tically instantaneously, by the heater, and then k This circuit is due to S. Darlington. 1 Developed by R. F. Wick. CONFIDENTIAL OTHER EXAMPLES 143 allowed to cool off naturally. By choosing the electrical and thermal constants in the circuit correctly the resulting smoothing can be made to approximate that obtained in a memory point circuit. As noted earlier, all these variable circuits require some auxiliary control means to reset the variable circuits to zero whenever a new target is engaged or the current target makes a sudden change in course. In the T15 memory point system this function was performed by an operator. The operator was aided by a series of meters which compared the instantaneous memory point rates with average rates set in some time previously by hand. The visual in- dication of a change in course, calling for the selection of a new memory point, was a rela- tively large, smoothly and decisively varying deflection on the meters. In contrast, normal tracking errors appeared as relatively small random fluctuations of the needles. The circuits of Figures 7 and 12, which were intended for bombsight applications, were also under the control of an operator, who was supposed to start the mechanism at the beginning of each bombing run. Two control methods were used for the cir- cuit of Figure 13. In one, large changes in rate, corresponding to probable changes in target course, were distinguished by comparing the instantaneous value of the target rate, as ob- tained directly from a differentiator, with the smoothed value obtained at the output of the smoothing circuit. In the other method, equiva- lent information was obtained by again differ- entiating the instantaneous value of the target rate, making a second derivative of the target coordinate. In either case this rate difference or second derivative information was used to control a gas tube, which went off, supplying heating current to the variable thermistor, whenever the voltage applied to it exceeded a certain threshold. This threshold evidently marks the minimum change in course for which the variable network will be reset. In order to permit the use of a low threshold, without making the circuit unduly liable to false opera- tion because of the effect of tracking errors, the gas tube input voltage was first transmitted through a low-pass filter which suppressed most of the energy due to tracking errors. A considerable amount of work was done on the proportioning of this filter to provide the best protection against false operation with a low threshold and with minimum delay in resetting in case a change of course actually does occur, but the problem remains an interesting subject for research. APPENDIX A NETWORK THEORY THIS APPENDIX GIVES a summary of linear network theory which is pertinent to the analysis and design of data-smoothing and prediction circuits. It is incomplete in many respects and should therefore be supplemented by reference to established textbooks on the subject. However, it contains some results which are new. The present summary will be concerned mainly with fixed linear networks. Variable linear networks will be considered briefly in the last section. A 1 IMPULSIVE ADMITTANCE A fixed linear transmission network is one in which the response V(t) is related to the im- pressed signal E(t) by a linear differential equation of the form b 'dW +bn - i dJiy^ + + M ' d m E d m ' l E with constant coefficients. It is well-known that the solutions of such a differential equation obey the "superposition principle." This makes it possible to formulate the response of the net- work to any signal, in terms of its response to certain standard signals. A convenient standard signal for analytical purposes is the "unit impulse." It may be re- garded as the limit of the rectangular pulse shown in Figure 1 as the duration of the pulse » i 1 Figure 1. Rectangular puise signal. is decreased indefinitely while the amplitude is increased in such a way that the area under the pulse is always unity. The limiting function thus denned does not exist in a strict mathe- matical sense. However, it is very convenient for analytical purposes, and seldom leads to difficulties, to proceed as though the limiting function did exist. An impulse occurring at t = a is conventionally denoted by the singular function S u (t — A) where «o(t) = if r ^ J h a (r)dr =0 if t < si if t> The response of a fixed network to an im- pulse or any form of signal is independent of the time at which the signal is applied, provided it is expressed as a function of the time relative to the application of the signal. Let W(t) be the response to the signal & (t). This is called the "impulsive admittance" of the network. Physically, it must be identically zero for nega- tive values of t. For an impulse applied at t = A the response will therefore be W(t — A), which is identically zero for t < A. A physical signal E(t) such as the one shown in Figure 2 may be resolved into an infinite Figure 2. Derivation of superposition theorem. succession of elementary impulses. The strength of the typical elementary impulsive component, such as the one shown in Figure 2 as occurring at time A, is E(\)d\. Its contribution to the response at time t is E(\)-W(t — A) dk. Hence the contribution of all the elementary impulsive components of the signal, to the response at time t, is given by the formula" V{t) = f + E{\) ■ W(t - A)d\ (2) This is one form of the "superposition theo- rem" for fixed linear networks. Before discussing the reasons for the limits of integration indicated in (2), it will be help- ful to consider a graphical interpretation other than the one used in deriving the integral. Let W(t) be of the form shown in Figure 3, and let ^(A) be of the form shown in Figure 4. To determine the response V(t) at a given value of t, the curve in Figure 3 is turned over from CONFIDENTIAL 145 146 APPENDIX A right to left and placed over the curve in Fig- ure 4 so that its right-hand edge is at A - t. The product of the two curves gives a third curve (not shown), which is identically zero for all . > t. The area under the third curve is the re- I — L-W(t) FlGl'RE 3. An impulsive admittance sponse V(t) at the given value of t. For pro- gressively larger values of t, the curve repre- senting W(t — a) in Figure 4 is simply slid to the right with respect to the curve represent- ing E (a) . LOO -i C I 1 ? 3 f'ieu* 4. Graphical iiiterpif iaUon turn theoiem ismee a physical signal must certainly be identically zero up to some definite time, or since it must certainly have been applied to the network at some definite time, that time could be taken arbitrarily as Zero and (2) could be written in the form V® = f Jo Elk) In this form, however, since A!rfA (3) jo is in general a function of t, the response cou.d not Oe interpreted as a weighted average of the signal. On the other hand, since j ^ H',/ - Ax/A = jT W\r)d7 is independent of t, the response may be inter- preted as a weighted average of the signal, if •/, - 1 1 h: as -ce.->sity of taking tiie lower limit in f2i j in order t" permit the interpretation of the response as a weighted average of the signal, is also expressed by the pi»iu1 of view that a hxed network cannot make any ,/n/sical distinction between having no applud signal and having an applied signal which happens to be of zero amplitude. Another shortcoming of the form i'Ai or, for that matter, of the form (2) if we set t as the upper limit of integration, comes from the con- sideration of impulsive admittances of such a nature that Wit - A) has certain kinds of sin- gularities at a — t. For example, the case for direct transmission, expressed in the form ... VU) /; > (A* • S (t - A),7A is ambiguous because the singularity in the integrand occurs exactly at one end of the range of integration. However, the form ./;' A I • bn't — Av/A leads, without ambiguity, to the result V (t) -- E(f) . This example is not trivia!. Every network which transmits infinite frequency must have an impulsive admittance of such a nature that WU \) contains a singularity of the I'm n, &,.(' a). Any attempt to rule out such a singularity on the ground that physical net- works cannot in fact transmit infinite fre- quency, complicates the analysis and design of networks unduly. If a network is capable of, or is expected to transmit frequencies at the top of the range of interest or importance, it is simpler to assume that the network is capable of, or is expected to transmit all frequencies above that range. One other advantage of taking the limit s of integration as indicated in (2) may be called to attention Keeping in mind that /-.'(a) is identically zero for all values of A below some definite though perhaps unknown value, and that Wit ai is identically , t ro for all values of a t, it is viear that (2) may be integrated partially any number of times without incur- ring the burden of carrying a string of iff ins outside of the integral. Af?«r one pamai inte- gration we have where I'/) .1 ;/ Sine £ i a, ..< identic. ! :> . ],„ ai . ,.,:„,.. f .-. in vM-.n-h Eix) > : ienti«all> zer. ... it d *inee LONHDL.Ml \1 APPENDIX A 147 A(t - A) is identically zero for all values of A > t, a second partial integration may be per- formed with no more formal complication than the first partial integration. The fact of the matter is that the terms which ordinarily arise in partial integrations, outside of the integral, are here carried under the integral by singulari- ties of the integrand. The superposition theorem in the i^rm (4) may be derived directly in a manner similar to the derivation of (2). A(t - i) is the response of the network to a Heav; ..e unit step func- tion H(t — a) applied at t A, where H(1 - X) m when t < X = 1 when t > A . The signal is resolved into an infinite succes- sion of elementary step functions of amplitude E'{k)dk wherever E(k) is continuous, and finite step functions of amplitude dE(k) wher- ever £"(a) has a finite discontinuity. The con- tribution of each elementary step function to the response at time t is E' (k) A(t — k)dk, that of each finite step function is A (t - A) • dE(k). Hence, the response is given formally by (4) with the understanding that E'(k)dk is to be interpreted as dE(k) wherever E(k) is discon- tinuous.* The response A (t) of the network to a Heaviside unit step function H(t) applied at t — is called the "indicial admittance" of the network. It is more familiar, in the field of linear transmission theory, than the impulsive admittance to which it is related by (5), but in this monograph preference is given to the use of the impulsive admittance. In the theory of linear differential equations the impulsive ad- mittance is known as a Green's function. It is often convenient to express the response so that the variable of integration represents the age of the elementary components of the signal. Introducing the age variable r = t- A (0) into (2), we have F(0 = £*FAt-T) ■ W(r)dr. (7) •Formula (4) may be written in the Stieltjes form V(t)= I A(t-\)aE(\). Alternatively, we may take the point of view that E'(A) contains impulsive singularities wherever E(\) is discontinuous. This point of view is generalized in Appendix B. In this form it is clear that the weighting of signal components is on the basis of age only. A fixed network may be said to have a memory which is a function only of the age of past events. In the preliminary stages of designing a smoothing network, the weighting function W( T ) is generally prescribed to be identically zero when t > T say, as well as when t < 0. This does not violate the conditions of physical readability. However, such a weighting func- tion cannot be obtained exactly with a network of a finite number of discrete impedance ele- ments. A finite network invariably yields a weighting function with a "tail" which extends to infinity. *•« TRANSMISSION FUNCTION Theoretically, the impulsive admittance of a prescribed network may be determined directly from the differential equations of the network in a perfectly straightforward manner. Prac- tically, however, it is very difficult to do so if the network has more than two meshes. Fur- thermore, the technical problem of designing a network directly from a prescribed impulsive admittance is even more difficult, particularly if the impulsive admittance is not exactly re- alizable. These difficulties may be avoided by recourse to the highly developed methods of network analysis and synthesis used in the field of com- munication circuits. These methods are based upon the steady-state properties of networks. If a signal consisting of the single sinusoid cos <i>£ is applied to an invariable or fixed linear transmission network, the steady-state re- sponse" will also be a single sinusoid of the same frequency. The amplitude and phase of the response, relative to the signal, will in general depend upon the frequency. The re- sponse may be regarded as the resultant of an "inphase component" proportional to cos o>£, and a "quadrature component" proportional to sin U, with amplitude coefficients which are functions of the frequency. Furthermore, since the signal is an even function of the frequency, the response should also be an even function of the frequency. Hence, the response will " This is the response apart from transient compo- nents, assuming that the latter vanish exponentially with time after the signal is impressed. c The signal is also an even function of the time but this is due only to the particular choice of origin which is arbitrary. CONFIDENTIAL 148 APPENDIX A be of the form G(w 2 ) cos wt — wH(w 2 ) sin wt, where G and H are even real functions of fre- quency. By a suitable shift of the origin of time it follows that if the impressed signal is sin wt, the steady-state response will be of the form G(w 2 ) sin^f + o)H(oj') cos wt. These two results may be combined into a simpler expression without any loss of indi- viduali ty. Since e iu>t - cos wt + i sin wt where i = \/ — 1, we have V(t) = '[<?(»*) -(- iuH(u')} ■ if E(l) = e". A further simplification may be achieved by re- placing iw by p, and G( - p 2 ) + pH{- p 2 ) by Y{p), so that V(f) = Yip) ■ e" if E{t) = e* . (8) Y (p) is called the "steady-state transmission function" or just "transmission function" for short. Strictly speaking, (8) expresses the relation of steady-state response to signal only if p = u>. However, it is customarily called a steady-state relation even when p is not a pure imaginary quantity. It may be noted that Y(p) is real when p is real. The simplicity of steady-state analysis de- rives from the fact that time occurs in the signal and throughout the network only in the form e pt . In particular, the determination of the transmission function is reduced to the solution of simultaneous algebraic equations which do not involve the time factor. For a net- work in which the signal and the response are related by the linear differential equation (1) with constant coefficients, we obtain simply KV 6o + 6,p + • • ■ + f>„p B ' It may be noted that the poles of the transmis- sion function, also referred to as "infinite-gain points" in the p-plane, correspond to the roots of the characteristic function of the differential equation. Physical restrictions on the location of infinite-gain points will be considered in Sec- tion A.9. AJ RELATIONSHIP BETWEEN IMPULSIVE ADMITTANCE AND TRANSMISSION FUNCTION A relationship between the impulsive admit- tance and the transmission function of a net- work may be obtained from (7). Putting E(t) = e" when t > 0, we get V(t) = ePt J^' w ( T ^ e'* 1 dT = e"jT W(t) e~* dr W(t) e-» dr (9) The second term in (9) is a transient term due to the fact that we have taken E{t) ==0 when t < 0. The first term in (9), which involves the time only through e"', is the steady-state term. Comparing this term with (8) we get Y(p) W(t) e~" dt (10) or, in the notation which will be introduced in the next section A.4 Y(p) = L[W{t)\ . LAPLACE AND INVERSE LAPLACE TRANSFORMS (ID The frequent use which is made of the Laplace transform and its inverse, in the analysis and design of fixed linear networks, warrants a brief discussion of these trans- forms. Given a function f(t) which is identically zero when t < 0, its Laplace transform g (p) is defined by the formula g(p) = Hf(t)] f(t) e-" dt (12) This is usually written with for the lower limit, but by having the point t = inside the range of integration, instead of at the end, we secure the same advantages for (12) that we gained in the case of (2) by having the point k = t inside the range of integration. Since f(t) is identically zero when K0 we could write — oo for the lower limit in (12) , but this would run the risk of confusion with the so-called "bilateral Laplace transform." On the whole, it is worth while to have a constant reminder that functions f(t) which are not identically zero when t < are ruled out. The integral in (12) is usually not con- vergent for all values of p. That is, in order to secure convergence of the integral, it may be necessary to assume R(p) >a, where R(p) is the real part of p, and a is a real number. The CONFIDENTIAL APPENDIX A 149 result of the integration is a representation of g(p) in the half-plane R(p) > a. Since the representation is analytic throughout the half- plane, the principle of analytic continuation allows us to extend the definition of g(p) to the remainder of the /;-plane. Given a function g{p) which is analytic throughout the half-plane R(p) > c where c is a real number, its inverse Laplace transform /(f) is given by the formula f{t) = L-'[ff(p)] ] fc+ia <j{p) €*< dp (13) provided /(f) is identically zero when t < 0. If the result of the integration in (13) is not identically zero when t < 0, g(p) is not a Laplace transform and the application of the inverse transformation to it is meaningless. Translation Theorem A useful theorem can be established at this point. This is the translation theorem. If G{p) = L[F(t)~\ then L->[G(p)e ^ = F(t - a) provided that F (f — a) =s when t < 0. Trans- lation is to the right or left according as a is — ™ positive or negative. If it happens that F(f)==0 when t < t where f > 0, then the restriction is that a> — t . That is, a limited amount of transla- tion to the left is permissible. In general, f = and the restriction is therefore that a > 0. This theorem follows readily from (12) or (13). In all of the applications of (13) which we have any occasion to make in the analysis and design of fixed linear networks, the function g(p) may be resolved into a sum of terms of the form G(p)e- pa where a > and G(p) is a rational algebraic function with real coeffi- cients. Making use of the translation theorem, the problem of evaluating L 1 [g (p) ] reduces to that of evaluating L-'[G(p)]. Now, G(p) may be resolved into a sum of terms of the form p" or l/(p — a) m+1 where m = 0, 1, 2 - ••. We shall consider these two cases separately. The case G (p) = p" will be treated by means of (12) and some limiting processes. In Sec- tion A.l the unit impulse was regarded as the limit of a rectangular pulse of duration T and amplitude 1/7. By means of (12) the Laplace transform of such a < f < T is over the interval 1 - tr* pT Hence L [£,(()] = lim 1 - e-> T _ T-*0 p f - 1 • Formally therefore L-> [1] = 1,(0 (14) Similarly, the Laplace transform of a pulse over the interval a < t < a + T where a > is 1 -c-" r pT Hence L[6 (t-a)} lim 1 - e-" r Formally therefore L-i [e-~] = & (t~a) . The last result follows directly from (14) using the translation theorem. Next, let r-*o ji This is the limiting case, as shown in Figure 5, of two impulses of strengths 1/T and -1/T separated by a time interval T. It may be called T -t V -i p Ct-T/T Figure 5. An impulse doublet. an impulse of second order. By (12) and the previous results L [1,(0] - Km 1 -«-"', - r-»o f v • Formally therefore L~ l [p] - «,«) . (15) Proceeding in this fashion we may define an impulse of (m + l)th order as Ut) = lim <— .«) - «— i (t-T ) T-*0 (16^ CONFIDENTIAL 150 APPENDIX A and we may then show that MM')] = r. Formally therefore L~ l [jr] « a.(0 then (17) This disposes of the case G(p) = p m where m — 0, 1, 2 • • • . The case G(p) = 1/ (p - a) "* l will be treated by means of (13) and Jordan's lemma. Jordan's Lemma If all the singularities of G(p) can be en- closed by a circle of finite radius with center at the origin, and if G (p) -*0 uniformly with respect to arg z as \z\ -> oo, then G(p)e*dp] - where r is a semicircle oi radius P , with center at the origin, to the right of the imaginary axis if t is negative, to the left of the imaginary axis if t is positive. By the use of this lemma the contour of inte- gration in (13) may be closed and the integra- tion may then be performed by the method of residues. In the case lira <?(P) (p - a)-+ l we readily obtain where m — 0, 1, 2 [(p - a)-+>] t < ml (18) / > 0. An important special case of (18), correspond- ing to o = 0, is J Lp" +1 J m! < > (19) Another useful theorem which is readily established by means of (12) and (13) is Borel's theorem. Borel's Theorem If 0(P), 9Av), 9ii.P) are the Laplace trans- forms of f(t) t /,(«), /,(*), respectively, and if g(p) - 0i(p) 0t(p) m - " x) /,(x)dx - £jx{T)-S*{t-r)dr. The functions /, (O and f t (t) are subject to conditions which permit the inversion of the order of integration in the following proof. However, these conditions are seldom of any concern. We have ftfl = L -l {0i(p) • L [/»(*)]} Inverting the order of integration and noting that 2x1 Jc-i<r> gi(p)t p(, ~ x) dp if X > t f(t - X) if X < < we obtain the result stated in the theorem. *•» ALTERNATIVE EXPRESSION OF THE RESPONSE-TO-SIGNAL RELATIONSHIP The result (8) obtained in Section A.2 sug- gests an operational expression of the form V® = Y(p) ■ E® (20) for the response-to-signal relationship what- ever the signal E{t) might be. If the equiva- lence of this operational expression to (2) it taken as a matter of definition we may readily discover the nature of the implied operation. In the light of Borel's theorem, (2) may be expressed in the form L[V(t)} = L\W(»] • L\EW] under the permissible assumption that £(t)«0 when t < 0. Hence V(#) = lr x [LflPOl ■ L{E(t))\ or, by (11) V(0 = L~ l \ Y(p) ■ L[E(t)]\ . (21) This is, therefore, in general the meaning of the operational expression (20) . 4 o We note that if S(p) = L\E(t)\, the operational V(t) ~ S(p) ■ W{t) U equivalent to (20). Thii form ia need in Section 104 and in Appendix B. CONFIDENTIAL J52 APPENDIX A The symmetry of the impulsive admittance is expressed by W(T - t) = W(t) Since W(t) =0 when t < 0, it must be so also when t > T. Hence ' W{t)e~*dt + / W(t)e~*dt. By a change of variable of integration the sec- ond term may be expressed in the form W(T -t)e-* T -»dt Assume that W(t) admits the series expan- sion Wit) = a + A,t + ... +4;r + ••• • < 25) 771 , r or, because of the sj Xr/i W(Qe* dt . Hence, if the first term in Y(p) be W(t)e-* dt we have Y(p) = Yy(p) + Yi{-p)er+* = [i r i(p)e pT/2 + Ki(-p)e- pT/2 ] tr* Tn . At real frequencies (p = u>) the bracketed fac- tor is evidently an even real function of Hence Y(tu) • e- u * r/I . (24) Apart from discontinuities in the phase angle of the transmission function at real frequencies » for which QU 2 ) is zero, the phase angle is proportional to frequency. Such a transmission function is referred to as a linear phase trans- mission function. Sinusoidal components of the signal, of frequencies less than the lowest fre- quency at which Q (<u J ) vanishes, suffer phase retardations in transmission in proportion to their frequencies. These components therefore contribute no delay distortion. They are delayed by a uniform amount, just as they are in a properly terminated distortionless, uniform transmission line, although in the case of (24) they contribute amplitude or loss distortion through Qiw 2 ). The delay in (24) is just half of the "smoothing time" T. SERIES RELATIONSHIPS BETWEEN IMPULSIVE ADMITTANCE AND TRANSMISSION FUNCTION Two useful series relationships between im- pulsive admittances and transmission functions will be derived in this section. for small positive values of t. Then by (11) and (19) (26) pi 1 ' pmH If A the transmission cannot drop off faster than 6 db per octave as the frequency increases indefinitely. If the transmission is to drop off ultimately at the rate of 6fc db per octave all of the A's up to and including A k . 2 must be zero. This is to say that the impulsive admittance and all of its derivatives of orders up to and including the (k — 2)th must vanish at * = 0. Next, let us suppose that the impulsive ad- mittance and all of its derivatives of orders up to and including the (k — 2)th are continuous through all values of t including t — except that the (k — 2)th derivative is discontinuous only at t = a. We may resolve the impulsive admittance into the sum W,(t) + W 2 (t) where W 1 (t) and all of its derivatives of orders up to and including the . (fc — 2)th are continuous through all values of t including t = 0, while W 2 (t) =0 for all values of t < a. Then, for small positive values oft — a A k .i (t - a)*"' W,(t) (k - (A k . t * 0) whence Hence the transmission cannot drop off ulti- mately faster than 6(k — 1) db per octave. We may summarize these results in the asymptotic loss theorem. Asymptotic Loss Theorem. If the transmission is to drop off ultimately at the rate of 6A; db per octave as the frequency increases indefinitely, the impulsive admittance and all of its derivatives of orders up to and including the (k — 2)th must be continuous through all values of t including t = 0. Discontinuities in W(t) or in some deriva- tive of W(t) cannot occur except at t = in the case of physical lumped element networks. Practically, however, rapid changes in W(t) CONFIDENTIAL APPENDIX A 153 or in some derivative of W(t), at any value of t, may be expected to be associated with much the same behavior of the transmission at rea- sonably high frequencies. As an example con- sider the case W{t) = e-- -e-v (0 > a > 0). - a F(p) (p + + W(t) is continuous through t — as long as is finite but becomes discontinuous there in the limit as fi-* ». The first derivative of W(t) is discontinuous through t = even when is finite. The ultimate slope of the transmission is 12 db per octave, in accordance with the asymptotic loss theorem, but in the range a < w < p the transmission appears to have a slope of only 6 db per octave. The importance of the observations made in the preceding paragraph, in the design of a network, is that if we attempt to approximate a W(t) which has a discontinuity in a deriva- tive of lower order at t = a than at t = 0, the fact that the physical approximation must have continuous derivatives of all orders and through all values of t except t - is not very signifi- cant. The ultimate slope of the transmission may not be reached until the frequency is too high to be of any importance. Another useful relationship between impul- sive admittance and transmission function fol- PHYSICAL RESTRICTIONS ON THE TRANSMISSION FUNCTION The transmission function Y(p) of a lumped element network is a rational algebraic func- tion of p. It is real for real values of p (A.2) . Hence, the coefficients must be real, and there- fore the roots and poles must either be real or occur in conjugate complex pairs. Such a function may be expanded into the sum of a polynomial and a rational function whose numerator is of lower degree than the denominator. The latter may therefore be prop- erly expanded into partial fractions. For a partial fraction of the form — L_ *he re)B =l, 2 ... (p — a)" the contribution to the impulsive admittance W(t) is by (18) I; 1 ~- 1 = , » « > 0) . L(p - a)"J (m - 1)! For a pair of partial fractions of the form A + iR A - iB (p - a + iff)" + (p - a - iff)m the contril 2r-i to the impulsive admittance is C (A cos fit + B sin pi) . (m - 1)! Since the impulsive admittance is the re- sponse to an impulsive signal it is clear that for /"» a stable network the impulsive admittance must lows from the assumption that / t-W (t) dt be free of terms which increase indefinitely with time, either on account of an amplitude is finite for m = exponential in 1, 2 ... If we expand the F(p) = / \\'itu-*,tt into a power series in pt we get F(P) - M, - M , p + _ 2! 3! + where rW(t)di . (27) (28) The quantity M m is the mth moment of the im- pulsive admittance. When M„ = 1 we speak of the response of the network as a weighted average of the impressed signal, and speak of the impulsive admittance W(t) as the weighting function. factor of the form e at where a > 0, or; in the event that a = 0, on account of an amplitude fac- tor of the form fr"- 1 where m > 1. Hence, the physical restrictions on the transmission func- tion are: 1. No poles with positive real parts. 2. Poles on the imaginary p axis must be simple." The poles of a passive transmission function correspond to modes of free motion. lsh Each of them may be shown lM to satisfy an equation of the form pT + F + - = o P where T, F, V are positive quantities whose values depend upon the particular mode and • Poles on the imaginary p axis must also be ruled out on the ground that persistent transients cannot be tolerated any more than growir CONFIDENTIAL 154 APPENDIX A its activity. However, T is zero in the absence of kinetic energy, F is zero in the absence of energy dissipation, and V is zero in the absence of potential energy. It follows that in the absence of coils or in the absence of condensers, the transmission function must have poles only on the negative real p axis. For extremely narrow-band, low-pass appli- cations, such as data smoothing, it is not prac- ticable to build networks which call for coils because these generally turn out to be of many thousands of henries in inductance. The exclu- sion of coils from these applications does not, however, rule out transmission functions with complex poles. These may be realized with RC networks in feedback amplifier circuits as is shown in Chapter 12. *•» QUASI-DISTORTIONLESS TRANSMISSION NETWORKS A quasi-distortionless transmission network is one which is distortionless only in a certain sense. This sense will be made clear in this section. Let Y(p) 1 + dip + o 2 p 2 + ■ ■ • +a m p m 1 + hp + 6 2 p 2 + . . . + bnjj* (29) This may also be written in the form Y{p) - 1 + c lP + C -^+... + C I^ +p r + lg(p)m Obviously g (p) will be a rational function with the same denominator as Y(p) and a numera- tor of (*n-l)th degree. If we now apply a sig- nal of the form E{t) = = r for t < for i > the response, by (21), will be V(t) « F + rcT* + ^7=2), cS-'+.-.+c, + rl L- 1 [g(p)} «>0). If the coefficients in the rational expression for Y(p) are such that ci = t/, c 2 = //,•■• c r = fj (31) then V(t) = (t + t,)> + r! L-i [g(p)} (t > 0). (32) The second term vanishes exponentially with time. The first term is an advanced or a re- tarded facsimile of the applied signal accord- ing to whether t, is positive or negative. We shall say that Y(p) is the transmission func- tion of a network which is quasi-distortionless to the signal t r . Obviously a transmission network which is quasi-distortionless to the signal f must also be quasi-distortionless to every signal f where s is a positive integer less than r, including zero. Hence we may state the quasi-distortionless transmission theorem. Quasi-Distortionless Transmission Theorem If the signal E{t) = for t < = polynomial of degree r at most in / for t > is applied to a "quasi-distortionless transmis- sion network of order r," the response will be of the form I'm = E{t + i f ) + {)(<■-<) for / > o, where O(e ') stands for terms which vanish exponentially with time. If t, > the transmission network is a pre- dictor for polynomials of degree r at most. However, it does not begin to predict properly until some time has elapsed after the start of the signal, or of a new analytic segment of the signal; that is, until the transients have sub- sided sufficiently. If t { — the transmission network may be regarded as a delay-corrected smoother for polynomials of degree r at most. This is ob- tained simply by taking ai = bi, n 2 = b 2 , ■■■ a T = b T (33) in (29), A. 11 VARIABLE LINEAR NETWORKS A variable linear transmission network is one in which the response V(t) is related to the impressed signal £(0 by the linear differential equation (1) with coefficients which are pre- scribed functions of t. The solutions of such a differential equation also obey the superposi- tion principle. Thus it is possible in this case also to formulate the response of the network to any signal in terms of its response to a standard impulsive signal. The response of a variable network to an impulse or any form of signal depends, how- CONFIDENTIAL APPENDIX A 155 ever, on the time at which the signal is applied. For an impulsive signal applied at time \ the response at time t will be represented by W(t,x). This is still called the "impulsive ad- mittance." In the theory of linear differential equations it is known as a Green's function. Physically, it must be identically zero for The superposition theorem may now be writ- ten in the form V(t) = jT + E(\) ■ W(t,\) d\ (34) provided the network has been properly de- signed and set into operation at t — 0. If W(t,\) dX = 1 for all values of t > 0, the response may be interpreted as a weighted average of the sig- nal. We note that in order to interpret the response as a weighted average of the signal, it is now no longer necessary to take the lower limit in (34) as — oo, as it was in the case of (2) for a fixed network. In other words, a variable network can be designed and set into operation at any time so that components of the signal which arrive before that time are completely ignored. The analysis and design of variable linear networks are in general much more difficult than those of fixed linear networks. This is due largely to the fact that there does not yet exist a technique corresponding to the steady-state and operational methods used in connection with fixed networks. However, there is a class of variable networks whose analysis and design are greatly facilitated by the fact that they are related to fixed networks by a transformation of the time variable. Consider the linear differential equation . d"V d n ~ l V , . dV , Tr „ with constant coefficients. With appropriate restrictions on the roots of the characteristic function 6nX n + fc.-xX"- 1 + ••• +bi\ + 1 it represents the response-to-signal relation- ship in a fixed network, if z is proportional directly to time. However, if z is a more gen- eral function of the time, it will correspond to a variable network. The kind of transformation which is desired here is one which transforms the range - oo < z < + tx into the range < t < + oo with a one-to-one correspondence. Thus, we may take z = log 6(t) where 6 (t) is a positive monotonic increasing function of t in the range < t < + oo, with <li£ 6(t) = 0. Sev- eral examples of 6(t), including 0(t) = t, are considered in detail in Chapter 14. CONFIDENTIAL APPENDIX B THEORETICAL MODIFICATIONS OF SMOOTHING FUNCTIONS TO FIT NONUNIFORM NOISE SPECTRA BEST smoothing or weighting functions have been determined in Chapters 10 and 11 under the assumption of random noise with fiat spectrum. It has not been worth while in prac- tice to base the choice of best weighting func- tions on any more elaborate considerations of actual noise spectra, for at least three reasons : 1. The effectiveness of a smoothing network shape of the weighting function. 2. Noise spectra are subject to variations, due to factors which it is not desirable in prac- tice to attempt to control. 3. Elaborate smoothing functions require elaborate networks with close tolerances on ele- ment values. Nevertheless, the theory of smoothing pre- sented in this monograph would not be com- plete without showing how more general shapes of noise spectra can be considered. Two meth- ods are presented here, which are generaliza- tions of those presented in Sections 10.3 and 10.4, respectively. » 1 PHILLIPS AND WEISS THEORY 7 Let g(t) be the tracking error, and W (t) the impulsive admittance of a smoothing and pre- diction circuit with smoothing time T. Then the error in prediction due to tracking error only, is m = f Q T Q{t - r) • W(t) dr. The impulsive admittance W(r) will depend also upon the time of flight which, for purposes of analysis, is assumed to be constant. The mean square error is then V2 = - lim kjl L Y ^ di Jo So W( Tl ) • C(n - T| ) • WWdtidtt where C(x) lim 2L g(\) ■ g(\ + x) d\ • (1) C(x) is the autocorrelation of the error time- function g (A) . For an nth order smoothing and prediction circuit V 2 is now minimized with respect to the impulsive admittance under the restrictions* jf T"W(r)dT = C-</)" (w = 0. 1. 2 ••• n). (2) Hence W(r) must satisfy the integral equa* tion jj C(t - r) • W(r)dr = * + *i< + • ■ • + U" (0 <. 1 <. T) where the k m are constants to be determined. Now, if i C(t - t) • W. m (r)dT = V" (0 <• t <. T) Jo (to = 0, 1, 2 - n) (3) then W(t) = hWoir) + hWi(r) + ••• + KW n (r). (4) The procedure is then to determine C(x) from (1), the W m (r) from (3), the k m from (2) and (4), and finally W( T ) from (4). It may be noted that, in general, every k m will be a poly- nominal of nth degree in t f . Hence the W m (r) appearing here are not the same as those de- fined in Chapter 11, although W(t) should be the same if the same W (t) is used in Chapter 11. A difficulty of the theory given above is in the solution of the integral equations (3) . This difficulty is avoided in the theory given in the next section. However, the integral equations are easily solved in case of flat random noise, when C(z) is simply an impulse of strength K say, at x = 0. Then W, < t < T. Since the strength is irrelevant, it may be taken equal to T so that W ( T ) will be normalized. 'These follow from the discussions in Sections A.8 «J A.10, especially equations (27), (28), (30), and 156 CONFIDENTIAL APPENDIX B 157 For a linear prediction circuit it is then found that W(r) = 2 (2 + %)w (r) - ! ( 1 + I ) Wr(r). Putting T = 1 this may be expressed as W(t) « Wo(t) + G,(- t f )voiM (t) in terms of the G.( T ) and W m ir) of Section 11.3. « SYMMETRY OF BEST SMOOTHING FUNCTIONS The theory of Phillips and Weiss offers the most direct proof that the best smoothing or weighting function must be symmetrical, re- gardless of the noise power spectrum. The situation is that of minimizing (1) under only one of the restrictions (2), viz., the normaliz- ing condition J r W(r)dr - 1 (5) The weighting function is therefore deter- mined, up to a constant scale factor, by the condition that jf C it - t) • W(r)dr « k, (6) where k is a constant. Substituting T — t for t and T — t for t, we have /C(t - • W(T - r)dr « k. (7) Since C( - x) = C(x), and since W(r) is de- termined uniquely by (6) and (5), it follows from (6) and (7) that W(T - t) = W(t). (8) »• GENERALIZATION OF ELEMENTARY PULSE METHOD The noise power transmitted through a net- work may be expressed in the familiar form p = / N( w ») • |r(t W )|»d« where N(u>*) is the noise power spectrum and Yip) is the transmission function of the net- work. Assuming that N(a>*) is a rational func- tion of »*, which is finite at all finite values of w including zero, it is possible to determine a rational function S(p), which has no poles on or to the right of the imaginary axis in the p-plane with the exception of the point at infin- ity, and such that |S(tw)|2 = A T (fc>2). It may be readily shown that r-'£v<f>Y* (0) where F(t) is related to the impulsive admit- tance W(t) by the operational equation F(t) = S(p) ■ Wit) (10) The problem is now to minimize (9) under the restriction ^ / Wit)di = 1 when <o > 1. (ll) Let where Qip) - (P + «i) (p + 01) • • • (p + «-) Hip) - (P + A) (p + A) ••• (p + A) and ft is of no consequence. One or more of the a's, but none of the pa may be zero. Since the existence of the integral in (9) imposes the requirement that Fit) have no discontinuities of higher type than finite jumps in the range - < t < 00, the continuity conditions on W(t) in (10) must depend upon the difference be- tween m and n in the expressions for Q (p) and Rip). If m > n, it is fairly obvious that Wit) must be differentiate, in the ordinary sense, exactly m — n times. In other words, Wit) and all its derivatives up to and including the (m — n — l)th must be continuous, but the (m - w)th derivative may have finite jumps. If m < n we must consider the introduction into Wit) of discontinuities of higher type than finite jumps. These discontinuities arise in the formal ex- tension of the concept of differentiation to functions containing finite jumps. If a function 4 it) has a finite jump of am- plitude A at t = a, the value of 4,' it) at that point will be indicated formally as A • S (t — a) where S it — a) is a unit impulse at t = a. If *'(a + 0) - *'(a - 0) = A„ the value of 4," it) at t = a will be indicated formally as A . it - a) + A, • 8„« - a) where $,(« - a) is a CONFIDENTIAL 158 APPENDIX B unit doublet at t = a. And so on, for higher de- rivatives of $(<). The expression (9) is a minimum under the restriction (11) if Wit) satisfies the differ- ential equation Qip) -Q(-P) W(t) = const. (12) when < t < 1 and Y (p) the condition 1 /**" 2^ / S(P) -S(-P) • y (p)e*dp - const, when < t < 1. (13) The restriction (11)' itself requires that TP(t) =0 when t > 1, and •i+ TT(<)<& = 1. (14) r Case I. (n = 0) The general solution of (12) contains 2m + 1 constants of integration which are determined by (14) and the 2m continuity conditions that Wit) and all of its derivatives up to and in- cluding the (m - l)th must vanish at t = and t = I. Case II. (n # 0, m > n) The general solution of (12) contains 2m + 1 constants of integration which are reduced to 2n in number by (14) and the 2(m - n) continuity conditions that Wit) and all of its derivatives up to and including the (m — n — l)th must vanish at t = and at t = 1. The remaining 2n constants are determined by (IS) . The left-hand member of (13) may be for- mulated by the method of residues. The ex- pression for Yip) should first be separated into two parts so that Yip) - Y L (P) + Y K (p)e-> where Y L (p) and Y K (p) are rational functions of S(p) S(-p) .Y L (p)e» in the left-hand in the left-hand half of the p-plane for the first part of Y (p) , and in the right-hand half for the second part. Hence, if the sum of the residues of S(p) - S(— p) - Y L (p)e» in the left-hand half of the p-plane be donated by S t . and if the sum of the residues of Sip) • S(—p) • Y M (p) ■ e»(t-i) i n the right-hand half of the p-plane be denoted by X K > then the condition (13) re- duces to 2t - - const. (15) Case III. (n ^ 0, m < n) The 2m + 1 constants of integration in the general solution of (12) are first increased to 2n + 1 by appending the 2 (n - m) singularities kit), «i(0, 1(0 «o(< - 1), Slit - 1), ••■ — i H ~ 1) and then reduced to 2n by (14) . The remainder are determined by (13) or (15). In formulating Yip) it may be noted that £,[«„(< - a)] = Example of Case I W«)] (a £ 0) . Let S(p) = p". The differential equation (12) requires Wit) to be a polynomial of degree 2m. The conditions at t = require it to have a factor t m , and those at t = 1, a factor (1 — t) m . This leaves only (14) to be satisfied. Hence Wit) - (2t ^, 1)! [*(i - 01- (0 <; t Z 1) in agreement with (8) of Section 10.8. Example of Case II Sip) p + a P + Let Then, by W(t) - A + A ie -« + A,f (0 < < £ 1) Hence Y( p ) . — + — — — -l (12) p + a p — a _ pL- + dip + A-q e -, |_p p + a p-aj 2, = Condition (15) is satisfied if 1 2 CONFIDENTIAL APPENDIX B 159 where Example of Case III Q « °" - 0i r . Let S(p) = 1/1 + fi. Then, by (12) and the sinh ^ + cosh rule for appending singularities in Case III Hence W(t) = A + AMO + A t 6 (t - 1) (0 £ 1). Hence l+Qcosha(/-i) In the limit as o-»0, S(p) - - _ j^T + — ^ — e ~ and 2* = - ^° ~ eK'-D . W(t) « =-±-2 (0 <: < £ 1) . Condition (15) is satisfied if 1 + 1 &i A f 62 + A\ m At — In terms of expressions (12), Section 11.3. Hence W(t) = Wt(t \ ± k ™ l(t) (0 il£l) , + + 6o(t - 1) where k = 1/6 [£'/ (2 + £)]. This is reminis- w ,q m f (0 £ f £ 1) cent of Stibitz's results mentioned in Section 2 10.3. 1 + -J p CONFIDENTIAL BIBLIOGRAPHY PART II 1. The Extrapolation, Interpolation and Smoothing of Stationary Time Series with Engineering Applica- tion*, Norbert Wiener, OSRD 870, Report to the Services 19, Research Project DIC-6037, The Mas- sachusetts Institute of Technology, Feb. 1, 1942. Div. 7-318.1-M2 la. Ibid., Chapter 1. 2. The AnalytiM and Design of Servomechanisms, Herbert Harris, Jr., OSRD 454, Progress Report to the Services 23, The Massachusetts Institute of Technology. Div. 7-321.1-M7 8. Behavior and Detign of Servomeehanitmt, Gordon S. Brown, OSRD 89, Progress Report 2, The Mas- sachusetts Institute of Technology, November 1940. Div. 7-821.1-M1 4. Antiaircraft Director T-15, OEMsr-358, Report to the Services 62, Western Electric Company, Inc., August 1948. Div. 7-112.2-M6 5. The Analytit and Synthetic of Linear Servomecha- nicmc, Albert C. Hall, OSRD 2097, Report to the Services 64, The Massachusetts Institute of Tech- nology, May 1948. Div. 7-821.1-MS 6. Antiaircraft Director, T-lS-El, E. L. Norton, OEMsr-858, Report to the Services 98, Bell Tele- phone Laboratories, Inc., July 80, 1945. Div. 7-112.2-M11 7. Theoretical Calculation on Bett Smoothing of Poti- tion Data for Gunnery Prediction, R. S. Phillips and P. R. Weiss, OEMsr-262, AMP Note 11, Re- port 532, The Massachusetts Institute of Tech- nology, Radiation Laboratory, Feb. 16, 1944. Div. 14-244.4-M'l AMP-703.4-M11 8. A Long Range, High- Angle Electrical Antiaircraft Director [Final Report on T-10], C. A. Lovell, NDCrc-127, Research Project 2, Division 7 Report to the Services 80, Bell Telephone Laboratories, Inc., June 24, 1944. Div. 7-112.2-M9 9. Flight Records of Pitch, Roll, and Yaw, taken in a variety of bombers at Wright Field, Ohio, Sperry Gyroscope Company, 1942-5. 10. Detign and Performance of Data-Smoothing Net- work, R. B. Blackman, OEMsr-262, Report MM-44- 110-38, [Bell Telephone Laboratories, Inc.], July 8, 1944. 11. Computer for Controlling Bombers from the Ground, E. Lakatos and H. G. Och, OEMsr-262, July 24, 1944. 12. A Position and Rate Smoothing Circuit for Ground- Controlled Bombing Computers, R. B. Blackman, OEMsr-262, Report MM-44-110-79, [Bell Telephone Laboratories, Inc.], Aug. 21, 1944. 13. A Two-Servo Circuit for Smoothing Present Posi- tion Coordinates and Rate in Antiaircraft Gun Directors, R. B. Blackman, Contract W-30-069- ORD-1448, Report MM-44-110-65, [Bell Telephone Laboratories, Inc.], Sept. 27, 1944. 14. The Theory of Electrical Artificial Lines and Fil- ters, A. C. Bartlett, John Wiley and Sons, Inc., 1931, p. 28. 15. Network Analysis and Feedback Amplifier Design, H. W. Bode, D. Van Nostrand Company, 1945. 15a. Ibid., Chapters 7, 8, 18, and 14 15b. Ibid., p. 813. 15c. Ibid., p. 326. 15d. Ibid., p. 801. 15e. Ibid., p. 38. 15f. Ibid., p. 12. 15g. Ibid., p. 78. 15h. Ibid., p. 110. 15i. Ibid., p. 133. 15 j. Ibid., Chapter 6. 16. Fundamental Theory of Servo-mechanisms, L. A. MacColl, D. Van Nostrand Company, 1945. 17. Automatic Control Engineering, E. S. Smith, Mc- Graw-Hill Book Company, Inc., 1944. 18. Die Lehre von den Kettenbrucken, B. G. Teubner, Leipzig, 1918. 19. "Transient Oscillations in Wave Filters," J. R. Carson and O. J. Zobel, Bell System Technical Journal, July 1923. 20. "Harmonic Analysis of Irregular Motion," Nor- bert Wiener, Journal of Mathematics and Physics, Vol. 5, 1926, pp. 99-189. 21. "Generalized Harmonic Analysis," Norbert Wie- ner, Acta Mathematica, Stockholm, Vol. 55, 1930, pp. 117-258. 22. "Stochastic Problems in Physics and Astronomy," S. Chandrasekhar, Review of Modern Physics, Vol. 15, 1943, pp. 1-89. 28. "Mathematical Analysis of Random Noise," S. O. Rice, Bell System Technical Journal, Vol. 23, 1944, pp. 282-832. 23a. Ibid., Vol. 24, 1945, pp. 46-156. «S 1S07S CONFIDENTIAL [>1 Cover Sheet for technical memoranda Research Department subject: The Transient Behavior of a Large Number of Four- v -' Terminal Unilateral Linear Networks Connected in Tandem - Case 20876 ROUTING: 1 - H.W.BW.B*F.-H.F#-Case Files mm- 46-110-49 2 — case files ° ATE April 10, 1946 3- L.G.Abraham-T.E. Brewer authors C.L* Dolph 4- C.H.Elmendorf-H.K.Krist idotbqkxoex C.E. Shannon s - H.S.Black-F.B. Anderson Index No. W1.416 e- G»N*Thayer-C.W.Harrison 7 - R.L.Dietzold a - L.A*MaoColl ' 1 9 - B.M.01iver 10- C.L^Dolph 11- C.E.Shannon ABSTRACT Asymptotic expressions for the transient response of a long chain of four-terminal unilateral linear networks connected in tandem subject to an initial disturbance are developed and classified accord- ing to the characteristics of the common transfer ratio. It is shown that a necessary and sufficient condition for the stability of the chain for all n is that the transfer ratio be of the high pass type. The mathematical results are applied to chains of self-regulating telephone repeaters. The Transient Behavior of a Large Number of Four-Terminal Unilateral Linear Networks Connected in Tandem - Case £0878 MM-4 6- 110-49 April 10, 1946 MEMORANDUM FOR FILE Introduction The transient response behavior of a long chain of invariable four-terminal networks connected .unilaterally in tandem is of primary importance in the design of cross-country wire communication systems, since the successful operation of such equipment depends upon the rapid damping of transients caused by suddenly applied inputs. While the emchasis in the memorandum will be directed toward coaxial systems cons'is-fcing of self-regulating ^repeaters spaced at 3-7 mile intervals and spanning distant points, the results are of a more general nature and would apply, with obvious modifications and corresponding interpretations, to any configuration involving a large number of four-terminal linear invariable networks connected unilaterally in tandem. It will be shown that there are two fundamentally different types of transient, response possible depending upon the gain characteristic of the transfer ratio of the individual four-terminal linear networks comprising the system. The first type of response while satisfactory is difficult to achieve in practice because of the stringent requirements on the gain characteristic of the transfer ratio. The second, a case often encountered in practice, will be shown to be unsatisfactory in general since it leads to build-up and overloading in any physical system comprising a large number of such networks. However, a guiding design orinciple will be suggested which, it is believed, will enable us to minimize the worst of the effects, and make the successful operation of a system of the type envisaged here possible. This memorandum is divided into two parts. In the first the problem is defined physically and then formulated mathematically. Following this, the history of the problem is discussed briefly after which the new results are summarized.- Finally, this part concludes with a discussion of their inter- pretation and implications for the coaxial system. The second part presents the detailed mathematical arguments which led to the new results of part one. PART I Statement of the Problem The analysis in this memorandum is directed toward the understanding of certain anomalous effects which a long chain of self-regulating telephone repeaters may exhibit at its output when the input end of the chain is subject to a transient disturbance (Cf. Figure 1). The gain settings of the repeaters in such a chain are usually controlled by the level of a pilot frequency some- where in the communication band and the regulation is designed to compensate for low frequency phenomena (up to approximately one cycle per second) such as the diurnal Change in line resis- tance. The repeaters in the chain are normally absolutely stable devices so that any transient which is presented to the input of any one of them will be evanescent in time at the output of that repeater. Since transients are not damped out instantaneously even in absolutely stable devices, a transient disturbance at the input to the first repeater in such a chain will be pro- pagated down the chain. It has been experimentally observed that under certain conditions the' maximum amplitude of a tran- sient disturbance may increase as the disturbance is propagated from one repeater to the next and in some cases there may be many oscillations of sufficiently large amplitude to render the system inoperative because of prolonged over-loading. If the entire chain from its input to its output end is considered as a whole, the chain does behave then in many respects like an unstable non-linear device in spite of the fact that each repeater in the chain is absolutely stable. Since it is obvious that the above type of behavior is at best undesirable in a cross-country link, it is necessary that its cause be thoroughly understood and that all .possible steps be taken either to suppress it or, if this is not possible, at least to minimize its effects. Although it is not reasonable to expect that transient oscillations can be kept from propagating down the line, or that it is possible to isolate the line from all transient disturbances it is reasonable to seek a means of guaranteeing that the tran- sients that are propagated down the line will never possess amplitudes that exceed the magnitude of the original disturbance or to seek a way to guarantee that the maximum response of the transient oscillations will occur so shortly after the initial disturbance that physical apparatus will be incapable of follow- ing or distinguishing it from the unavoidable initial disturbance. A way of guaranteeing the first of these will be discussed at length and a suggestion will be made which it is felt will guarantee the second, although no rigorous proof of this last fact has yet been given. Fig. 2 represents a schematic drawing of a typical satisfactory type of transient response which might result from a unit step input to the first unit of Fig. 1. Fig. 3, on the other hand, represents a schematic drawing of a typical unsatis- factory type of transient response which could result from the same input to a system of the type of Fig. 1 which had different characteristics. Briefly then, the problem to be discussed is that of determining the relationships between the network characteristics and the transient response for networks of the form of Fig. 1. Mathematical Formulation of the Problem A sudden change in level in the pilot freauency before the n-th repeater results in the modulation of this frequency, changing it from its normal form A sin <i> t C to A sin u> t [1 + f(t) ] c where f(t) represents the modulation introduced by the tran- sient. After passage through the n-th repeater, this last expression is transformed into A sin (u> t + <p) [1 + g(t)], - 4 - where the repeater and regulator have (possibly) changed the carrier by the addition of the phase angle q> and have modified the original envelope A[l + f(t)] into A[l + g(t)]. It is clear that from the standpoint of regulation it is sufficient to limit discussion to the transformation of f (t) into g(t) .* The exact relationship between f(t) and git), of course, depends upon the characteristics of the repeater-regulator cir- cuits which are in general non-linear. However, for small signal inputs their behavior may be satisfactorily represented by that obtained from a linear invariable four- terminal network. Thus, the chain of self-regulating repeaters may be replaced, for the purpose of mathematical analysis, by a chain of linear invariable four-terminal networks having a common transfer ratio y(p). Thus, the blocks of Fig. 1, will be idealized as being such linear four terminal networks throughout the analysis. Because regulation is designed to compensate for low frequency phenomena, certain characteristics that y(p) should possess are known a priori : namely; " (1) y(p) must represent a high-pass system. That is, . y(p) — > 1 as p — > oo (2) y(0) should be zero if, in the terminology of servo theory, there is to be no static error. ■ In terms of y(p), the design of a self-regulating system reduces to two problems: (I) Given y(p), to calculate the transient behavior of the chain of self-regulating repeaters, (II) The design of a system having a y(p) which leads to satisfactory transient behavior. The rest of the memorandum will be concerned largely with the first of these. The calculations will be carried out in general terms and the different types of possible responses will be described in terms of the characteristics of y(p), * Transit time between repeaters is neglected throughout this memorandum. More exactly, we choose a different origin of time at each repeater, so that the transit time does not appear ex- plicitly in the formulae. - 5 - Mathematically the problem discussed in this memoran- dum can be formulated as follows: If 'y(p) represents the common steady-state transfer ratio of the four-terminal linear units shown connected in tandem in Figure 1, the output voltage response of the n-th unit V(t) is given by the inverse Laplace integral: v n (t) = ^ -C + 1CD c-ioo y(p) n e p H (p) dp where V (p) represents the spectrum of the input voltage, o For an impulsive input of intensity Y Q applied at time t = 0, = V For a step function input of height V Q applied at time t = 0, V Q (p) = V Q /p. - Specifically, this memorandum will be devoted to the study of the behavior of V n (t) for large values of n. Four-terminal networKS are normally classed as low-, band-, or high-pass depending upon the character ofly(iw)|. Typical examples of I y( ico) I are shown in Figure 4a, in which, following the usual practice, ly(iu)l has been normalized to be unity at a) = in the low-pass case; at o> = w o> (the mid-band frequency), in the band-pass case; and at to = oo in the high-pass case. From the viewpoint of the asymptotic behavior of the system in Figure 1, it is convenient to modify this classifica- tion somewhat when speaking of the over-all gain characteristic, |y(iu))| n , of the transfer ratio of a system comprised of n units. For sufficiently large n, it is clear that |y(iu)| n would lead to curves of the type shown in Figure 4b corresponding to the low-pass, band-pass and high-pass curves of Figure 4a . Thus, for sufficiently large n, the gain curves B*, C«, and D* of - 6 - Figure 4b are seen to exhibit the type of behavior normally associated with a band-pass characteristic. A'* and E*y °n the other hand, exhibit behavior of the type normally classified as low-pass and high-pass. For these reasons, the terms low-, and high-pass will henceforth be reserved for those gain character- , istics which are always less than their values at u = and a) = oo , respectively. The termj band-pass, will be used to cover all other cases; namely, those in which ly(ia>)| possesses one or more maxima at finite frequencies, the values of which exceed the values of ly(iu))| at both zero and infinity. History of 'the Problem Several people have considered this problem in the above mathematical form. Before proceeding to a discussion of the results of the general theory, it will be instructive to consider a few illustrative examples of their results. Let (2) = y(p) = p/(p+D The gain characteristic is clearly of the high-pass type and satisfies (1) and (2) of Page 6. If the input voltage is a unit step, then, by the theorem of residues, ,n-l d(t) n-1 i ' — 'p=-i where L- ,(t) denotes the Laguerre polynomial of degree (n-2). A plot of V n (t) for n = 1, 2, . . . , 10 is shown in Figure 5. It is known that for large n Lit) = J= ? (nt)- 1/4 cos 11 V TT 2(nt) 1 / 2 - g *This examde was first treated by L. A. HacColl (MM-39-325<-166) , 9/11/39 and W. H. Wise ( UK- 38-343-22 ) , 8/2/38. The above treatment follows that of LlacColl. where = is to be interpreted as "asymptotically equal to." Thus t A plot of the approximate "envelope" t 1 e 2 (nt)' 1 / 4 is given for n = 50, 100, 150, 200, and 250 in Figure 6. The response in this case is seen to be both ampli- tude and frequency modulated, the "instantaneous frequency" in the sense of frequency modulation theory being given by u ' m ^ (2(nt) 1/2 ) « A while the envelope of the amplitude modulation is approximately exponential. In particular, the type of behavior found here can be considered satisfactory since there is no tendency for the magnitude of the largest overshoot to increase without limit as the number of repeaters is increased. As will be shown later, this type of behavior is typical of any network having a high-pass characteristic in the generalized sense of that term as it has been defined above. In MM-40-3500-92 dated 10/14/1940, J. G. Kreer and J. H. Bollman concluded that the appropriate y(p) for a self- regulating repeater employing a directly heated thermistor element in the control device was given by It should be observed that for o 4= this transfer ratio does possess static error. L. A. MacColl in MM-40-130-270 treated this case for Id < 1 and found that the system exhibited essentially the same type of satisfactory behavior as that discussed above. - 8 - (2) A slightly more complicated example is given by y(p) = P<P + °] (p + D 2 * ' It is easily seen that for a < vTT, I y( iu>) I is a high-pass jharacteristic in that I y( ico) | < 1 for all finite to and y( io>) I — > 1 as co — > oo . On the other hand, if ft > -/IT, y(io))| possesses a maximum greater than 1 at some finite frequency. ly(ito)[ is illustrated by curve I in Figure 7 for a = 1.4 (high-pass) and by Figure 8 for c = 2 (band-pass). The response V n (t) to a unit step function is shown in Figures 9 and 10 for these two cases with n = 1,2 9. The character of the response is seen to be of a radically different kind for these two values of a. For a = 1.4 the response is seen to be of the same type as that encountered in the first example. For a = 2, on the other hand, it seems to represent an oscillation in which the magnitude of the largest overshoot is increasing without limit as n tends to infinity. Later it will be shown that this is in fact the case and that satisfactory operation is impossible for a large number of repeaters in this case. From this and other considerations L. A. MacColl conjectured that a necessary and sufficient condition that the response V (t) be bounded for all n was that the transfer ration y(p) have no net gain at any frequency. Mathematically expressed, a necessary and sufficient condition that I V n (t) I < M for all n, where M is independent of n and t, is that (M) I y( ito) I < 1 for all real frequencies to. Physically, the condition on y(ito) prevents the transfer ratio ]y(ito)| n for a system using n units from having a tremendous gain at any particular frequency. This case was also treated by L. A. MacColl, but no memorandum on it was ever written. In one sense this memorandum could be summarized as a proof of this conjecture. In particular, a direct proof of the necessity of MacColl's condition (M) is given in the second part. The remainder of that part is devoted to an indirect proof of the sufficiency. The argument consists in exhibiting the two types of possible responses; the first being that associated with a y(p) satisfying MacColl's condition and that second that resulting from a y(p) which violates it at one or more frequencies. Statement of Results The detailed results of the sufficiency argument are discussed conveniently in terms of the generalized characterization of high-, band-, and low pass y(p)'s as given on page 8, The results will be taken up in that order. High Pass In terms of the above classification, the class of high pass y(p) 's consists of just those functions which satisfy MacColl's condition and are therefore those from which a satis- factory response could be expected. For the y(p) f s in this class, it is clear on physical grounds that the maximum contri- bution to the response V (t) of equation (1) will come from the large values of |w| since for these values of I u| , |y( io))| n > 1 while for all other values of I co| , I y( iu>) I n — > 0. Using the first three terms of the Laurent expansion of y| iu>| about u = oo , one finds: (5)* y(iu)) = 1 + S_i + \ , (6) ly(iu)l ~ , a 2 + 2b 1 + — s — 0) 1/2 to (7) Angle y (iuj Sf.g . * It is assumed that a > 0, b < 0, and that 2b + a <,0. These assumptions correspond to a second order maxima at I u)l == oo and to a monotonic decreasing phase function for y(p) as I oo] — > oo . - 10 - If these approximations, which are valid for I to| sufficiently large, are introduced into equation (1), it can be shown that the principal contribution to V (t).for a unit step input is given by: V n (t) * (n)- 1 ^ (nat) -lA exp | jfi!j±-^>tj cos (EvHSt This, with a suitable interpretation of the constants a and b is seen to be of the same general form as the response obtained by liacColl for y(p) = p/(p + 1) as given by equation ( Just as in that example the response is both frequency and ampli tude modulated. The instantaneous frequency of oscillation is again given by • The gain for y(p) = P(P i (P I D 2 is shown on curve I of Figure 11. Curve II of this figure represents ly(iw)| 100 for this y (p'). For this example and n = 100, the true gain |y(iu)|100 an a the gain approximation resulting from equation (6) are indistinguishable on the scale of Figure 11. The corresponding phase characteristic for y(p) 100 is plotted on Figure 12 where, for reasons which will appear in Part II, the actual frequency has been replaced by w» = ^_ . -✓n Again, on the scale of Figure 12 the actual phase is indis- tinguishable from the approximation resulting from equation (7). Figs. 7 and 13 present the same information for y(p) =2l£_^il (p + ir and n = 100. - 11 - Again the agreement between the actual phase and the approxi- mation is excellent. However, there is a considerable error in the gain approximation for small I <d| ► This large error is unquestionably due to the fact that the value o = 1.4 is near the critical value a = ST at which the characteristic changes from high-pass to band-pass. Agreement with the above asymptotic formula can of course be obtained by increasing n sufficiently. Alternately, for n = 100, a better approximation to the gain can be obtained by writing y( iu) = 1 + a i .0) b ~2 + CO and ly(iu)l = l + 2b + a 2d + b + 2ac CO' ' I/ 2 This approximation leads to a curve which is indistinguishable from that of FyU^)! 100 in Figure 7. With this approximation, one finds the following expression for V Q (t) when the input is a unit step function * V (t) * (nj^Cnat)- 1 / 4 cos (2^nat JL ) exp( (a ^ 2b U ) ( (2d + b 2 + 2ac)t 2 ) i 1 + 2^ ■! ( ) This expression is seen to approach that given by equation (8) as n > co . Thus one can conclude that the response will always be satisfactory if' y(p) belongs to the class of high-pass characteristics . Band-Pass Case MacColl»s condition is clearly violated whenever ly(iu))| has one or more relative maxima greater than 1 at finite fre- quencies. For simplicity the case where |y(iw)l has only one suet 12 - maxima at u = to will be treated first. It will furthermore be assumed that this maximum is of the second order; i.e. d 2 dw 2 ^ 0. Under these conditions, it is physically clear that the maximum contribution to the response V (t) as given by equation (1) will be due to those frequencies near o>o, at which I y( iu>) I possesses its maximum, since as n increases ihis region becomes increasing more important than all the rest. It is also clear that the time of maximum response will be given by the delay time experienced by the frequency w Q in passing thru the network. This is known to be given by. t Q = - n B'( w ) where B f (u ) denotes the slope of the phase characteristic B(u>) in the expression (10) y( iw) = A(uj) exp ( iB(u) ) . If A(to) and B(u>) are expanded in a Taylor's series about u> = co q and terms up to the second order retained, it can be shown that the response to a unit impulse function is given by (ii) v n (t) = A( ^J n VZn G(u ) exp ( -(t-t o ) c H(0) n ) o/ ) cos |u> Q t + nB(u Q ) where 0(» ) - n-V8j ( ( — A"(" ) -1/4 * CB»»(w )n H(« ) A' '(cu p) (I A"l« Q ) 2> > - 13 - (B"(w ) A{« J) i o ((, o ) = arctanj 2a ,, ([Uq) ) ) t Q = -nB(w Q ) . Thus V (t) can be interpreted as an amplitude modulated n wave with an envelope proportional to the Gauss error curve (-(t-tj 2 ) e *Pj 2n H ^o)j with a standard deviation given by ( ( ( n ( ( (A )2 - )l/2 (B"(U) Q ))2 J ) The standard deviation cr is of course a convenient measure of the duration of the disturbance. The maximum response occurs for time t = - n B' (« ) at which time the amplitude is proportional to A(" ) n . ✓IE Thus if A(w ) >1, the maximum response will represent a value which is very large compared with unity, the magnitude of the original disturbance, if n is large. This would force any system involving vacuum tubes to overload if n were sufficiently large. These properties are summarized in Figures (14) and (15). Figure (14) is a plot of the response for values of t near t for a few values of n for the example given by equation (4) where a = 2. Figure (15) is a plot of the maximum response for a few values of n for different values of the parameter a. It should be remarked that the above approximation to the gain which was obtained by keeping only the first two terms - 14 - of the expansion of A(w) about go = u) Q could only be expected to be a reasonable one for fairly large values of n, since it represents a usually unsymmetric gain characteristic by a symmetric function. A better or second approximation can be obtained by using three terms of the Taylor's expansion instead of two. Just as in the high pass case, the retention of this extra term gives rise to a second term in the expression for V n (t) but it does not fundamentally alter the characteristics of the response since the correction term vanishes for t = t , at which time the response is still a maximum, with the same amplitude as before. Its only effect is to take cognizance of the unsymmetrical character of the gain characteristic A(w) and to change the resulting response envelope to an unsymmetrical one. Of course, it also modifies the phase of the oscillation inside the envelope in a complicated way without changing the fundamental frequency of oscillation. • • For these reasons and because of the complexity of the resulting expression, it will not be written down here explicitly although the explicit approximation to the gain A(w) will be discussed in Part II. The two approximations to the gain are illustrated for equation (4) with a = 2 in Figure 16 for n = 100, In this case . . |u)|-/) 2 + 4 A(u) = 5 • (iT + 1 As can be seen from the figure, the second approximation does in fact represent A(w) over the significant range of frequencies near -w from which it can be concluded that the response will be unsatisfactory. Figure (14) r previously referred to, furnishes a picture of the envelope response as obtained from the first approximation. In the event that A(^) takes on its maximum value at more than one place in the finite frequency range, it is clear that the above results can be generalized as follows: Let V . (t) be the response of the form given by equation (11) due to a maximum at co = w- , Let the time of maximuma response - 15 from this maximum be denoted by t. = -nB*( w j_)» Then the total response is clearly given by the expression k v n (t) = Z V .(t)., n i=1 ni if there are k relative maxima* Unless the values of A(w) at the points u) = are nearly the same, it is also clear that only those terms of the above sum which correspond to the largest maxima of A(w) will be of significance. . The band-pass case is also discussed briefly for unit step inputs in Part II. Low Pass Case Since the low-pass case differs from the band pass case only in that A(w) has its maximum for w = instead of at u = u Q ^ the results of the two are very similar. The results in the low-pass case are simpler because it will be recalled that B(w) (as defined by equation 10) is an odd function of 10 for any physical network, This forces both B(0) and B'^(0) to be zero so that for an impulsive input one obtains the simple formula; (12) j It) Vim In" 3 / 2 n -/2n ( A"(0) -1/2) (t-t Q ) 2 A(0)) J exp [ 2n A'* (0)j This result corresponds to the well-known formula from transmission line theory for non-distortionless lines. Remarks From the practical viewpoint the above results have the following implications for communications systems such as a cross-country coaxial telephone system employing self-regulation repeaters spaced at intervals of a few miles. (1) If the transfer characteristic of each individual network is of the high-pass type (in the sense in which this term has been used above) then the transient response will never exceed the initial value of the disturbing input voltage and it will be damped out so that the operation of the communication system would generally be considered satisfactory. - 16 (2) If the network is not of the high-pass type, the usual practical case, and there is any net gain in the system, which is peaked at u> then for even a small number of units the response will exceed the initial input at the time given by t Q = - nB'(u> ) where A'(u) ) = and if the number of units is sufficiently large the output from the n-th unit will be large enough to cause severe over- loading. At first glance these implications are not promising and seem to indicate that the operation of a cross-country system involving several hundred repeaters and regulators would be extremely difficult, since , the only satisfactory characteristic is difficult to attain in practice. However, "practically the ideal characteristic which is high pass can be approached in the sense that the peaked frequency can be made very large. Thus the maximum response may occur so soon after the initial distur- bance that the physical system would not be able to follow it or to distinguish it from the initial disturbance which in many cases would be large enough to cause momentary overloading of the system. Moreover, it is ah experimental fact that in the design of feedback regulator characteristic forcing the peaked frequency higher reduces the size of the- peak which in turn will permit the use of a larger number of regulators in the system. If this is done, the time of maximum response, t Q = nB'(^ ), will be small since B'(a)) in general is small for large u). Assuming that the effects of the maximum response have been treated in this way, it is natural to inquire into the type of response which will result for finite values of t > t Q . If one examines the gain characteristic curve of the type shown in Figure (7), it is clear that for frequencies less than some frequency u>, slightly less than the peak frequency u> , - 17 - the shape is fundamentally like that of the high-pass case. Remembering that the phase delay of a frequency through a linear network is given by the slope of phase characteristic at that frequency, it is clear that the response for values of t greater than t Q , the time of maximum response, will come from the fre- quencies less than u Q , since the phase slope characteristic is large for small frequencies and small for large frequencies. Now if it is assumed that the phase characteristic nB(u>) is a monotonic decreasing function of to, it is clear that the 'function (nB(w) + tot) will always be stationary at an arbitrary frequency u>, provided that t is given a suitable corresponding value. Thus, it is reasonable to expect that the response for t » t Q * will exhibit the same type of character as that obtained in the high- pass case discussed above. This, it will be recalled, is both frequency and amplitude modulated with an envelope which decreases approximately exponentially. Thus, under these circumstances it seems reasonable to supoose thet satisfactory operation of the communication link could be obtained. To recapitulate, the most practical design for any system of the type envisaged in Figure 1, from the viewpoint of satisfactory transient response involves approaching the high- pass characteristic as closely as possible by making the gain characteristic of the transfer ratio peak at as high a frequency as is practicable and by keeping the phase slope characteristic monotonic for all smaller frequencies. PART II Mathematical Discussion Theorem I. A necessary condition that the response V n (t) from a chain of n-four terminal linear invariable networks sub.ject to~a" unit step input function have a common finite bound for all n is that the transfer ratio y(p) satisfy the relation - (M) |y(iu))|< 1 for all real values of w. * A different type of expansion, valid for any fixed t or n — > co is discussed at the end of Part II. By - 18 - Proof: By hypothesis Iv (t)|< M for all n where M is independent of n and t n ■ ,00 so that V n (p) = J e" pt V n (t) dt n VP) y(p) n . , pVn(p) ly(p)l n - ipl|f° e~ pt v n (t) dt| lv n (t)l dt < I pi M J I If p = c + iw and if c > 0, then ' 2 'c C + Od M so that log (y^kllog ^V/ Thus, in the limit as n — od , it follows that for any p with a positive real part log I y(p) !< - 19 - and hence ty(p}]< i Since this relation holds everywhere in the right-hand half plane, it follows from simple continuity considerations that the maximum of ly(iw)|, never exceeds 1, Thus ly(iw)l < l as was to be shown. The remaining discussion will be devoted to the characterization of the different types of possible responses and will, at the same time, furnish an indirect proof of the fact that the condition (M) on y(p) is also sufficient. High Pass Case - Unit Step Input If the networks comprising the system shown in Figure 1 possess a transfer ratio having a high pass ^ gain char- acteristic in the sense defined above, and if one writes , y(iu>) = A(u) e iB(u)) then the gain function A(«) satisfies the two conditions (A) A(w) < 1 for all finite frequencies u». (B) Lim A(w) = 1 to •-* 00 Under these conditions it is clear that, for sufficiently large n, the main contributions to V n (t) will be due to the high values of I u)| . For convenience, . V n (t) is written here in slightly dif- ferent form V n (t, -He \l f A( .,» e W«'-' -^ ("J ) - 20 - For large values of I w| , all physical transfer ratios y(ito) of interest to us here can be represented by an expansion of the form* M „v , . , . ( , ai b ci d ) • We. shall confine our attention to the ordinary case, in which a > 0, b < and 2b + a 2 < 0. For large values of f col , we now have 1/2 (14) A(u) = S[l + \ + 4 + ...T 2 + C§ + -% + ---l 2 ! V GO U) to ' a c (15) B(u)) = arctan u) — + —75- + • • • , b d 1 + ~2 + ~4 + It is clear that, for I oo| sufficiently large, the leading terms of these expressions will furnish adequate approxi- mations to A(u) and B(w). These are: 2 9 . 1/2 (16) A(w) = [1 + a + z 2b ] (IV) B(u)) = § . Let u Q be the frequency defined by the condition that these approximation are accurate to within the arbitrarily chosen permissible error e for values of go such that w>w q . Then we can write * In the usual case y(p) is a rational function, so that this expansion can be readily obtained. - 21 - ( „co . r _ , , . n n doj V n (t) = ± Re J o ° A(co) n e irnB(u)) + ut^] - O) CO o =-±Re (I x + I 2 ). It is clear that II I < fo iam£ dw- 1 ~ J I col ■ Since fA(w) J n — for each co in the finite range < to < u , it is clear that 1 I -J can be made negligibly small by taking n sufficiently large. Introducing the new variable v defined by the relation v = CO J na ■ I 2 can be written as r 00 1 + (a + 2b )t nav V Letting (a + 2b)t av 2 - 22 - and using the binominal expansion, one has Ca* + 2b) t 2 nav n/2 — 1 + n/2 1 + f + | (§ - 1) (41)' 1 + J + 1/2 (1 - ^) (X) + e^ 2 + terms in l/n. Thus, for sufficiently large n, I 2 becomes, approximately e 2 (a + 2b)t 2av e Vnat (- + v) dv In this form the principle of stationary phase can be applied to I 2 (Cf. Appendix I); for the amplitude factor (a 2 + 2b)t 2av 2 e v is independent of n,. while the phase function (in the notation of the appendix) ¥(v) « + v) is monotonic in the range of integration on each side of the stationary point (v = 1) where tp'(v) = - 23 - Physically speaking the form of equation (18) suggest the interpretation of V n (t) as the sum of an infinite number of complex waves whose amplitudes are slowly varying function of v and whose complex phases are rapidly varying functions of v. Under this interpretation it is physically reasonable to expeot that wave interference will occur everywhere except near v = 1 where the phase function given by equation (19) is stationary. This is the principal of stationary phase. It remains to evaluate the principal contribution to I g for values of v near 1. Replacing y (v) by the first three terms of its Taylor*s series about v = 1, q>(v) = cp(l) + + - 1 ) = 2 ♦ (v -l) 2 the main contribution to I g is given by r>l+Tl 1 * e ir2vnat - |] 1-n e 2av 2 iVnat (v - l) 2 dv, e In the interval (1 - r\ f 1 + r\) t the amplitude factor i exp T(a 2 + 2b)t/2av 2 ] is substantially constant and may be removed from under the integral sign and evaluated at v = 1. By the reasoning of Appendix I, the contributions to the remaining integral are not appreciably affected if the limits are changed to (-co, oo ) respectively. Letting I * v - 1 we can then write 1 in the form I ~ exp j (a 2 exp fi 2v€St - 1 §3 f°° e iVMt « d£ ( ) -CD - 24 - By the known properties of Fresnel integrals —00 and hence Taking the real part and dividing by n, the asymptotic expression for V n (t) is therefore given by: (20) V n (t) = n'V 2 (nat)- 1 ^ exp ( ( a g +2b)t ) cos {Z/m _ n, which is equation (8) of Part I. A more accurate approximation to the gain A(w) n is given by if,.i n 2b ♦ a 2 2d + bf_j_2ac-.l/2 A(w) = [1 + * + t J where the first three terms of equation (13) have been retained. From this it follows that: m.a* ~ n ( / 2b + a 2 2d + b 2 + 2ac ? A(w) = exp -J- ( § + t J exp [n (2b . a 2 ) ] exp j| (2d+b 2 +2ac) | (* ^ ) ( 2 ^ ) from which it follows that the second approximation is obtained by multiplying the first by the factor exp (p r jn (2d + b 2 + 2ac) If the frequency transformation v = 7? is now made the first factor will as before be independent of n. Over the range of integration where the integral is significant their product can be removed from under the integral sign giving V (t) = (n)" 1/2 (nat)*" 1/4 cos (2Vnat - exp (a 2 * 2b)t 2a _ exp (2d + b 2 + 2ac)t 2 P 2a 2 n % (u)" 1/Z (nat)" 1/4 cos (2vnat - $) e (a + 2b )t 2a , (2d + b 2 + 2ac)t 2 1 + J 5 1 * ••• 2,eT n _J which is the equation (9) of Part I. Band Pass Case - Impulsive- Input For simplicity let it be assumed that the gain charac- teristic A(u) has only one absolute maximum at u> = w Q on the positive frequency range and that this is a second order maximum. - 26 - The response V n (t) can always be written in the form (co ) A ( w o ,n f n log H^-r inB(u) + iut ) V n (t) = — Re J o e n l0 * TU^f ♦ dw ). In this form, V n (t) can again be interpreted as being proportional to the sum of an infinite number of complex waves of amplitude with varying complex phase* given by cp(w,t,n) «= nB(o)) + wt. With this interpretation it is clear that the maximum contri- bution to V n (t)^will be given by those frequencies - in the neighborhood of u> , where u Q satisfies A r (w) = and at values of the time t near t at which the phase function, <p(u>,t,n) is stationary for the maximum frequency i» Q . Thus t Q is given by t = .nBM« ). Since A(w ) ^ and A«(w p j = ♦"Phase" as used here differs from the way it is normally used in engineering. 27 - one can write for a suitable small neighbothood of w Q If we retain only the first term of this expansion, then for a suitably restricted neighborhood of w Qt one has n e n log A(uQ "TEC A(u> ) nA"(u>o) (u _ u ,.: Similarly, for u sufficiently near o) Q B w (co ) 2 (23) B(o>) = B(co Q ) + B»(w )(" ~ « )'* — g < w " V * Henceforth for simplicity, we shall write A = A(co ), A" = A"(w o ), B = B(w ), B» = B»(« ), B" = B w (cj q ) If these approximations are valid in the neighborhood, (u Q - A, w Q + A it follows that v n (t) ( i R e ( f A(u>) n e^ nB(w) + Wt: d(, W o +A _J ♦ A u) Q+ A u> o -A exp nA n ( W - a) ) 2 + i[nB + nB» (w - (D Q ) - 28 Since [A(u>)] n — as n — oo , except near u = w q , it follows as before that the sum of the bracketed integrals can be made negligibly small in comparison with the remaining one if n is taken sufficiently large. Recalling that t = -nB'CO o o the remaining integral can be written as T n (t) = | Re U n e 1 ^ ♦ -tl ,u) o +A r „ exp M 11 (w "^o 1 + i(t - t o )(a) -° ) o ) inB" ) dw) ) Again the finite limits of integration can be replaced by - go and oo since » for large n, I*- (--.-„)' e will be small except in the immediate neighborhood of u . If one sets p . -n (£ * oB") . p 2 = i 2 (w - w o ) ; g - t t Q then the remaining integral can be recognized as pair No. 710.0 of the Campbell and Foster Tables. Then one finds V n (t) = —372" Re {{ A n expCinB+io) t 3 exp [-(t-t Q ) 2 ] 2n°/ & ( ( VP 4p The result is equivalent to that given by equation (11) of part I. If A(cj Q ) is greater than 1, it is thus seen that the response will have a maximum value that builds up very rapidly as n increases and would eventually force any system involving vacuum tubes to overload. It should be remarked that the above approximation to the gain could only be expected to be a reasonable one for fairly large values of n, since it represents a usually un- symmetric gain characteristic by a symmetric function. A better or second approximation can be obtained by keeping the second term of the expansion of the logarithm in (21), and then tak- ing the first term of the expansion of (U) - 0) )' . e This yields The addition of the second term in the above ex- pression gives rise to an additional term in V n (t), provided that the same phase approximation (23) is retained. The resulting V (t) is similar to (11) but the new envelope con- sists of the old envelope plus nA"/6A times the third deriva- tive of the old envelope. The modulated frequency remains the same but the phase is changed in a complicated manner. (Compare- pair 710.3 of the Campbell and Foster tables). Unit Step Input In this case one can write V n (t) = - Re oo i[nB/u) + g] (I) As before the only significant frequencies are in the neighbor- hood of a) = to and near this point the 1_ in the denominator can be taken out of the integral as l/w" provided u> Q i 0. Thus the result will be same as for the impulsive input apart from the factor l/w Q if one makes nB(u>) - n/2 correspond to nB(u>) in (11). Low-Pass Case It is clear that the analysis for this case in which the equation A'(") = is satisfied for w = can be carried through in exactly the same manner as the band-pass case treated previously. The resulting answer is capable of simplification, however, if it is recalled that B(w) for any physical network is an odd function of This forces both B(0) and B ,f (0) to be zero. The resulting formulae then become a) Impulsive Input b) Unit Step Input (24) A(0) n e W A(0) 2n A"(Cfr Tt 3/2 v n (t) A(o) n 3/2 /2nA' Ha) n J A(Gj ,t (-(t-t Q ) 2 A(o)) exp j 2nA"(») j dt ' 31 - This last expression involves an integral since it is necessary to eliminate the pole at zero where A(w) has its maximum. This can be done by differentiating V n (t) with res- pect to t, finding the aysmptotic formula for V^(t) as before and then integrating to obtain (24) • Hamy*s Expansions in the Band-Pass Case The type of asymptotic expansions so far given for the band-pass case were explicitly designed to represent V n (t) in the neighborhood of t = t where V n (t) is a maximum. They could in no sense be considered the true asymptotic expansions for values t« t or-t» t . In particular their derivation o o depended upon the fact that the 'time of maximum response was related to the number of four terminal networks by means of the equation t =-nB'(w o ), so that as n — oo , t Q — oo . Other types of expansion are clearly possible. Two obvious alternatives are: (1) Those valid for fixed n as t — oo ; (2) Those valid for fixed t as n co . The first of these will not be considered here since they are of little interest as all of the four terminal networks - have been assumed to be absolutely stable. The interested reader is referred to the book by Doetsch on Laplace Transformations for expansions of this type. Since the second type of expansion is of interest here and is not to be found in most of the standard reference works it will be discussed here briefly. In a classic paper, M. Hamy* derived general ex- pansions of this type for complex integrals of the form J f(z) <p n (z)dz ♦journal de Mathematique, vol. 4, 6th series, 1908, page 203. under a variety of hypotheses on f(z) and <p( z) . These condi- tions include the case where qr(z) has a saddle point given by the solution of tp*(z) =0 and the result of this case is a generalization of the often-used theorem of Fowler which one finds in his book on statistical mechanics under the title of the saddle point method. More to the point, they also include the case where cp(z) has one or more maxima on the path of integration at which <p*(z) =0 provided that f(z) admits a Taylor series expansion about these points. In particular, then, if one considers t as a fixed parameter 'they apply to the integral of equation (1), with c = and <p( z) = y(p); f(z) = ePtv Q (p). In terms of our notation, one finds that: (a) for an impulsive input with gain maxima at <*) = w Q 2A n (cO x V tJ ~ nB'(a>°) COS r V + n B(u, o ):i + term in ^ * (b) for a unit step input with gain maxima at w = u Q f 0. 2A n (w ) , V n (t) ?a COS [ V + nB ^o ] ^ + termS in — ' ■ v o' o n It is interesting to note that these formula indicate a dependence upon 1/n instead of 1/Vn as in the case of the previous expansion. These formulae can be thought of as repre- senting the response in the band-pass case for any fixed t, t« t Q . 33 Appendix I ■ Certain remarks of Aueral Winter* on the justification of the principle of stationary phase are pertinent enough to the above discussion to bear repetition here. In order for the integral (25) f(x) e^ (x, dx to be asumptotically represented as p — oo , by the formula (Cf. Lamb, Hydrodynamics p 395) (26) a ^J^ToT . e irP9(a)±inJ . y|pltp"(a)l where cp'(a) ■ and where the upper or lower sign is to be taken according as <p"(a) is positive or negative, it is evident that two things are sufficient. (1) The contribution to the integral outside a small interval around the stationary value a of <p(a) must decrease more rapidly as a function of p than the one obtained in the neighborhood of a; (2) The asymptotic formula given above must adequately re- present the behavior of the contribution to the integral from the neighborhood of. the stationary value a. Now, if, on any closed interval I, <p*(x) is continuous and has no zeros, and if <p(x) is strictly monotone in this inter- val, then z = <p(x) can be introduced as a variable of integration on that interval, transforming S into * Method of Stationary Phase Journal of Math. & Physics, vol 24, no 3-4 - 1945 - 34 - f(x) e^ (x) dx f [^(zJJ e ipz dz If, in addition to the above, <p(x) and tp f, (x) are continuous and if f(x) and'f'(x) exist and are continuous, this last integral can be integrated by parts, giving S = | fr^une ip2 j Ip { ) 1 ip e±PZ A fCT _i (z)]dz -1, and showing that on any such interval I, S=0(I). Thus, condition (1) will be satisfied if, in the neighborhood of the stationary the integral is greater than point o(I). a, the contribution to This is clearly the case when the asymptotic formula (26) is valid, since there the dependences on p is as 1/vp. it can be shown that (26) is valid whenever -1 tp(ct) = 0, <p tf (a) f and <p« • (x) and f|> are of bounded variation in the neighborhood of the stationary value. Thus, to recapitulate, under these conditions, the maximum contribution comes from the stationary point and depends on p as l/vpt while the points which are not near the stationary point contribute terms depending upon p only as l/p , To conclude this brief appendix, it should be remarked that Winter gives an extension of (10) which is valid under the same condition of f[tp~l(z)] if the first n derivatives of <p(x) vanish at some point a while cp n+1 ( x ) does not. These results could be used to extend the treatment of the high-pass case given above to the cases in whion a 2 + 2b = 0, etc. C. L. DOLPH C. E. SHANNON Att. B-392415 to 392428 FIG. 3 8A-392.4-I5 * <\1 ol t <0 '— (OOI=U)% — (0S=U) , 1. 125 db- loodbjo* - 1 • - • [am] : ■ ST 1 APPRC )x.-y / 5 • • \ \\ \ * \ \ —f / 1 \ \ 2 ! ND APPR0X. \ VV T APPR0X. J * [AU»] * / 1 >-» T APPRO X. \ f FIG. 16 "» A Electronic Methods in Telephone Switching C. E. Shannon In the recent development of electronic digital computing machines various new tubes and other electronic devices have been designed which may be of use in machine switching. In particular the "selectron" tube developed by R. C. A. and the mercury acoustic delay tank provide large cheap memory devices in which information can be registered or read off in electronic time intervals (of the order of microseconds). Since one of the chief functions of the relays and switches in a telephone exchange is that of memory (e.g. the relays remember which calling and called lines should be connected together) it is worth while considering the possibility of using such tubes to replace ordinary electro-mechanical switching equipment. Suppose we have an exchange (or set of exchanges) serving n subscribers and that the exchange can handle a peak load of m simultaneous conversations. These may be between any m pairs of the subscribers. Thus the exchange must be capable of assuming as many different states as there are of selecting m pairs of objects from n . This can be done in n\ ml 2 m (n - 2m)! different ways. For n and m large the logarithm of this is approximately 2m log n . If the logarithm is to the base ten then this is the required memory capacity of the exchange measured in decimal digits. If the logarithmic base is two the units are binary digits. A single two-position relay has a capacity of log 2 units (one binary digit or .30103 decimal digits), while 5 relays have S log 2 units. A 10 x 10 crossbar switch has a capacity of 10 log 10, while a single commutator on a panel has capacity log r , where r is the number of vertical positions of the brushes. Hence the number of relays required for a pure relay exchange would be 2m log n log 2 ' the number of 10 x 10 crossbars would be 2m log n 10 log 10 ' etc. To these estimates must be added the losses due to inefficient use of the memory and also the memory of equipment used for functions other than merely remembering which connections are being held at a given time. An ordinary relay is capable of remembering (by a holding circuit) one binary digit. A pair of vacuum tubes in a flip-flop circuit has the same memory capacity. The cost of these is of comparable magnitude, and thus if one designed an electronic telephone exchange by merely changing relays to equivalent vacuum tube circuits the chief advantage of the electronic circuit would be one of speed, an improvement of order 10 3 . In many cases this could produce a reduction of cost since frequently many identical units of a certain type must be supplied because the individual units are slow. This is apt to be the case with units which are associated with the beginning or end of calls but need not be used during the conversation. On the other hand equipment to be used throughout the call would offer less advantage under this tube for relay replacement since the expected duration of calls is long compared to electronic times. The newer electronic memory devices, however, change this picture considerably. A selectron tube (when these tubes are in production) may be expected to cost $100 or less depending on the demand. It is capable of holding 4096 binary digits, giving a cost per binary digit of the order of 2.5 cents, while the cost of the equivalent relay may be of the order of 2.5 dollars. Mercury delay lines can store information at a comparable cost. Thus it is not impossible that a reduction of the order 100 to 1 in switching equipment cost might be possible by the use of electronic devices, even in the parts where information must be stored for long periods of time. An indication of how such tubes may be used is given in the attached figure. Fig. 1 is a block diagram of a simplified exchange. The calling parties are connected to an electronic commutator which samples the speech signals periodically and puts the various lines in the time division multiplex. The called parties are also connected in time division multiplex to a single channel by means of an electronic commutator or distributor. The function of the middle part is to rearrange the samples in such a way as to provide any desired interconnection between calling and called parties. This is done by dividing the sampling period into two equal parts. During the first half the signal plate of the upper selectron is connected by gate 1 into the calling line multiplex channel. Its windows are caused to open in sequence. Thus at the end of the first half-cycle the first samples of all the incoming channels have been written on the face of the tube in their regular order. During the second half-cycle gates 1 and 3 are closed and gates 2 and 4 are opened. Thus the output of the selectron is fed into the called line multiplex and the windows of the selectron are controlled by the other selectron tube 2. This tube has registered in a suitable notation the numbers of the called line desired by the calling line. The windows of this tube are opened sequentially by the cycling unit and the numbers registered there control the windows on tube 1 allowing the sample from calling channel 1 to go into the proper place in the called line TDM. By a more elaborate system it is possible to make use of the fact that only a small fraction of the lines will be busy at a given time, as is done in ordinary relay switching. This can be achieved by only supplying enough places in the distributors for the peak load. When a call originates the calling and called parties are assigned idle spaces in the distributor. The place assigned to the called party is registered in the selectron register corresponding to the place assigned to the calling party. Some Generalizations of the Sampling Theorem We have seen that a function of time f(t) containing no frequencies over W cycles per second can be described by- giving its value at Nyquist intervals (spaced ^ seconds apart). It can be reconstructed from these samples using the basic functions sin 2nWt/2nWt , together with the same function shifted by integer numbers of Nyquist intervals. We now consider some generalizations of this result. In the first place the particular function sin 2nWt/2nWt is by no means necessary for the reconstruction. In fact any function cp(t) which contains all frequencies up to W is satisfactory. More precisely the spectrum of cp(t) should not vanish over any finite set of frequencies (set of positive measure) up to W. If <p(t) satisfies this condition the original function f (t) can be reconstructed using cp(t) and its shifted images <p(t + ~) . That is coefficients a £ can be found such that °° K f (t) = 2 a K q>(t + f») . j[ — _ 00 *»• * w In general the coefficients are not found as easily as in the special case where cp(t) = sin 2nWt/2nWt (when they are merely the values of f (t) at the Nyquist points) but they may be calculated as follows. Let F(w) be the spectrum of f (t) and $((0) be the spectrum of cp(t). Expand the function F((d)/$(co) in a Fourier series using -W to 4W as the fundamental interval. - 2 - Thus . ko) F(cj) _ T _ _ 2W ft(u) ~ L S K 6 ° r £& F(w) = Z a K 0>(oj) e 2W . Taking the transform of the equation we obtain the desired expansion f(t) = 2 a K cp(t + !y) . The coefficients in the expansion can therefore be determined as the coefficient of a Fourier series expansion of F(w)/<I>(<d) . In general the function cp(t + ^) will not form an orthogonal set and therefore the energy in f(t) cannot be found from 2 a K as it was in the simple case where «p(t) = sin 2nWt/2nWt. A physical method of performing this expansion can also be given. Consider a filter which gives the output sin 2nWt/2nWt when the input is <p(t) . If the function f(t) is passed through this filter the amplitudes of the output at Nyquist intervals will be the desired coefficients. This is true since this output can be considered as expanded in the f mictions sin 2TrWt/2rrWt with the amplitudes as coefficients, and the inverse filter would restore the original function and change each of these functions with cp(t) at the corresponding Nyquist point. A function f (t) can also be determined from a knowledge of its value and derivative at alternate Nyquist points: We have here the same number of measurements per second, 2W, but half of these are ordinates of f(t) and half are derivatives. The reconstruction of f(t) from these values can be carried out simply using two basic functions: _ ( + x _ sin 2 nWt Tllt) '"wmT m x . sin 2 rrWt *2 {t) ~ (nWt) * Both of these lie entirely within the band W and has the property that it and its first derivative vanish at alternate Nyquist points (except for t =0 where the function is 1 and its first derivative 0) . Likewise cp 2 and cp£ vanish at alternate Nyquist points except at t = where cp 2 = and (p 2 = 1. Thus we can fit the ordinates of the original function f (t) using ^ and its shifted images (shifted by two Nyquist intervals). The derivaties of f(t) are fitted using cp 2 and its shifted images. Due to the vanishing of these functions none of the fittings interfere. The function constructed by this process must lie within the band and have the same values and derivatives as the original function f (t) at alternate Nyquist points. That there is only one such function can be shown by arguments similar to those used in the basic sampling theorem, generalized by break- ing down the spectrum into an even and an odd part. - 4 - It is possible to carry this further and determine a function from knowledge of its value and first (n - 1) derivative at points separated n Nyquist intervals apart. In this case the basic functions are sin 11 (Sgfc) *1 = n ( 2nWt x n 1 n ' _ sin n ( agt ) 1 n ' s . n n ( 2^t } n-2 /2nWt% K ~ n~" ; r n 2nWt n These functions possess the properties: 1. They lie within the band W. 2. They vanish at t = |g K = ± 1, ± 2, ... , (that is at n-th Nyquist points) and also their 1st, 2nd, (n-1) derivatives. 3. At t = 0, all derivatives of cp_ vanish except the s-th s derivative which is 1. Consequently we can reconstruct f(t) by using <p g to adjust the s derivatives (s = 0, 1, n-1) and these adjust- ments will not interfere. The functions q; and their spectra are shown in Fig. 1 s for the cases n = 1, 2, 3* C. E. SHANNON Att. e 1 March 4, 194S UVf- The Normal Ergodic Ensembles of Functions Among the possible probability distributions in a one- dimensional space certain ones are of special importance because of their simple mathematical properties and frequent occurrence in the physical world. The most important of these is the normal or Gaussian distribution with a density function: 1/J2R a exp £ | x 2 /<^ In an n-dimensional space the most important distribution func- tion is an n-dimensional generalization of this, the n- dimensional normal distribution: i 5 r - -i ^IV<a»r e*P a i;j x i xj Here a^ is the associated quadratic form and the determinant of this form. This form is positive definite and the surfaces of the constant probability are found by setting the argument of the exponential function equal to a constant 2 H . x ± Xj = C and are therefore coaxial elipsoids in the space. The direc- tions of the axes of this elipsoid are those of the eigen- vectors of the form a^ and the lengths are inversely proportional to the corresponding eigenvalues. By a rotation of axes the new coordinate system can be lined up with these directions and the distribution function reduced to - 2 - n {X 1» #oe » V (2n) exp - | Z 5^ y* where the \± are the (positive) eigenvalues and the y^^ are the new coordinates. The form a^j being positive definite has an inverse A^j which is also positive definite with eigenvalues The properties of the n-dimension normal distribution which give it particular mathematical importance are the following. 1. If x ± and y ± are two chance vector variables, which are independent and distributed according to n-dimensional normal distributions with quadratic forms a^ and b^. (inverses A^j and B^) , then the chance vector variable = x± + J i is also distributed normally with the form c^y whose inverse is C ij = fij + B ij° 2. If x is a normally distributed vector variable and yj = 2 r^j x^ is a vector variable which is a linear operation on (possibly of smaller dimension thann) then yj is normally distributed with the inverse form = Z r, r^ A st • ij s,t is jt ,3. Under certain quite broad conditions the resultant of a large number of small chance vector variables, x® (s = 1, 2, N) with arbitrary distribution functions, which are independent gives a normal distribution for 3 - with providing no term of the sum contributes more than a small fraction to any B. 4, If the a priori probabilities for each of two independent vectors x i and y ± are both normal, the a posteriori probability of x^ when we know the sum x ± + 7^ — ^ is normally distributed (about a displaced mean, however). 5. The mean value of x ± x^ for x ± normal is given by x i x j = A ij * Among the many possible ergodic ensembles of functions f a (t) there is also a certain class of particular mathematical and physical importance. This class of ensembles can be con- sidered a generalization of the n-dimensional normal distribution to infinite dimensional function spaces ergodic under trans- lations in time. We shall call these normal ergodic ensembles of functions. They are completely specified by giving their power spectra P(w) or their autocorrelation functions A(t) which are the Fourier transforms of the power spectra. The normal ergodic ensembles can be defined in various ways. They occur physically when we pass a thermal noise through a filter, shaping the power spectrum to P(w) = |l(w)| 2 , T(«) being the admittance of the filter. In the literature on noise these ensembles are often treated in a loose somewhat illogical fashion by using either of two "representations." The first representation is oo 2 |P(nAf)Af cos (nAft + 6 ) . n=0 The 6 n are all uniformly and independently distributed over all values from to 2n. This representation amounts to making the noise the sum of a large number of small sinusoidal waves with random phases, and amplitudes adjusted to give the proper power density in any small frequency range. The frequency increment between adjacent waves Af is supposedly very small and in use one evaluates any desired statistic of this set of functions and determines the limit approached by this statistic as Af - 0. This limit is taken to be the desired statistic of the normal ergodic ensemble. The second representation is similar but uses normally distributed amplitudes a n whose variance cr is equal to P(«) 2 a B Af cos (nAft + 6J . Actually these "representations" will not give the correct answer in all cases. For example, if we ask what fraction of the functions in the representation ensemble r^ are periodic, we find that all are, so the probability is unity, and the limit as Af is also therefore unity, while almost none of the functions in the ergodic normal ensemble are periodic However it can be shown that if we restrict ourselves to what we have called physical statistics, the answer will be identical; the normal ergodic ensemble is the physical limit of either of the above ensembles as Af -* 0, A more logical definition of a normal ergodic ensemble can be given as follows. We divide the frequency range up into unit intervals and construct the sequence of "flat" ensembles for these intervals. These will be given by 2 a„ sin nt • n These ensembles are passed through shaping filters to give the proper power spectrum in the interval in question and the results added. The normal ergodic ensembles have properties analogous to the n-dimensional normal distributions which we have given. We have Theorem: The sum of two functions f Q (t) + gp(t) where f and g are from normal ergodic ensembles with spectra and P 2 is normal ergodic with spectrum P 1 + P 2 . Theorem: The output of any linear invariant transducer driven by a normal ergodic ensemble is normal ergodic with spectrum |Y(«)| P(w). Theorem: Any finite dimensional linear operation on a normal ergodic ensemble gives a normally distributed vector. March 15, 194$ C. E. SHANNON p Systems Which Approach the Ideal as g — 00 We will show that it is possible to construct an p instantaneous system for sufficiently large - for transmitting a sequence of binary digits such that the frequency of errors is arbitrarily small and the power required only slightly greater in db than the ideal for the corrected rate of trans- mission. More precisely we have the Theorem: Given any e>0 and 8 > we can transmit binary digits on an instantaneous basis with frequency of errors < e and corrected rate of transmission R > W log -jl + (1 - 5) | J The system to be used is of PCM type with an extremely large number of amplitude levels. Let there be 2 s levels, and number them with a binary notation, but in the Stibitz type code, so that only one binary digit changes on going to an adjacent level. If we are in error by d levels, at most d binary digits of the s will be incorrect. If there are many levels in the a distance U/I) of the noise the expected number of errors will be approximately 2 •p We take £ large enough so that es > a. Thus the frequence of errors in our final result will be < e. The levels should not be spaced uniformly but according to the density of a normal distribution. If this is done the received signal will be nearly Gaussian with a — J? + N and the corrected rate of transmission H > W log 1 + (1 - 5) | C. £• SHANNON March 29, 194$ DO Theorems on Statistical Socuencea If It la poaalbla to go froa any state with P > to any other alone a path of probability p > 0, tha system la argodlo and tha atrong law of large nuabera can be applied. Thus the number of tines a given path p^j in the network la traversed in a long sequence of length K is about proportional to the probability of being at i and then chosaing this path, P.p. 4 K. If N is larne enough the probability of percentage error i 6 In thia la less than c so that for all but a aet of email probability the actual numbers lie within the limits Hence the probability that nearly all sequences lie within limits ± ft is given by and lfijLJfc l B limited by • I(P lPiJ ± |)log PiJ or | ^ - * PiPij log Pijj < * Thus we have I Theorem For almost all sequences 2 Um ' to*-* • H • - i PiPij log Pjj where p is the probability of the sequence baring the block of length L starting at the first position. Thus for all but a set of blocks of probability < « and for B large enough (H - $)«<- log p < (H ♦ n)H *.p(H - q)H. < — p log p < P(H ♦ n)M where «e hare aummed orer all but the set of small probability i. p(H ♦ a.)I £ (I ♦ sJM * P S W * *>* and * p(H - q)* (H - q)I * P U - q> ■ U - •> For the sot of oaall probability •I p log p ^ log ^ since this is maximised f or ip • t by making all p equal, and the number of them 1 -Jj • But this is dominated by • l P log p| £ |«W lo« | 1 •» with « as snail as d« sired for sufficiently large K and small c. Henee this does not affect the sua ia the limit as I -* oo and we have the Theorems Lia £ I p (B t ) log p(B L ) - H I - oo where plB^ is ths probability of block B^ of length L, and the sua is ovsr all possible blocks. We now prove the Theorem H • - i. p(B i jSj) log PB^ 8 !* « Lie -* q(B t Sj) log q B (3^) UBHoe where p(B lt 8j) is the probability of block B i followed by 8^ and PB^Sj) is the conditional probability of 8j after the block B t ia known to occur. q(B lt 8j) in the probability when B^ ia computed on the basis of any initial state probabilities, not necessarily the proper ones and q^Sj) the corresponding condi- tional probabilities. The first equality is trus since we may summ first on all B ± leading to a given state K. *he terms q, B ^CS ^) are then all equal to Pjj and the terse qlB^j) sum to P K Pjj gives the desired result. If the q»s are used, the q^lSj^ are still p^ where I It the stat* In which B± ends. * qU-.S.) • p kj i. P(B 1 ) since any Initial distribution tends toward equilibrium. We hare shown that apart from a set of small probability, the probabilities of blocks of length L lie within the limits -(H - S)M .(H ♦ S)M * < S> < 2 where S can be made small by taking B large enough. Let the maximum number of blocks of length M when we delete a set of measure • be Q g («). Thent I p - (1 - t) remaining set Q (I) p - Q (M) 2* lH * * )M t max c log t l«) > (H ♦ 6)M ♦ log(l - t) Hence log (li) Lim S - %U) £ 8 I -CO II Similarly 1 > I p > G C (K) pj^B frota which we obtain log and •U) * H Hence we hare Theoremi vU) - » 'or t J 1 0, 1 Tha fact that for large M nearly all blocks hare a probability limited by ri°JLE ♦ s < * does not imply that those probabilities approach equality. In fact they will generally diverge from one another but the db range becomes small compared to K, eince for p's satisfying 6 this inequality *»« Pmax lQ g Pmln m log _ I II 1 It it possible to show, however, that thert exists among the blocks of length It a subset, all of equal probability which hare the sane growth with K as the set including all blocks except those of small probability totaling less than t: namely , the subset will contain more than 2* H " ^ N eleoents with 5 arbitrarily small. Consider all blocks beginning in a given state, say state 1, and ending in this state. Let these blocks B 1 fig*... have lengths n^, n 2 ,...., t^, .... and conditional probabilities p^, p 2 , p at ..... when we start from state 1. We first prove -1 Theorem: I p^n^ • p^ The first part is true since the ergodic character of the system makes the Inverse frequency of occurrence of state 1, equal to the mean distance between its occurrences, I Pi*i« The second part is true since almost all blocks of large length N have approximated the proper frequency of each B^. Now we return to the construction of a subset of growth (H . 6)1 2 all of equal probability* Let us choose integers a i at close as possible to and construct sequences with of the block B ± . The number of block* is then and the number of sequences: » <- P t log p t The growth Is then in term* of symbols lag* . , * 4* . This proves the following! Theorems Given I > there exists a set of M blocks of length X (when H is sufficiently large) such that AS - ft)S k> a and each block has the same probability, and starts and ends in the eeme state, which can be chosen arbitrarily* In case the system is not ergodle but made up of a finite number of ergodle systems: r - X c t r t each r t will hare a rate H i which we may assume arrengee in a now increasing sequence The function %{•) then bieoMi a decreasing atep function in the manner Indicated by the following I Theorem! In the case conaidered K-l ?(c) • in the internal la^ <i< j ^ For if c it in the range indicated we oust take a set of poaitiTe probabilities froa at least one of r 1# ...» rj. This gives a growth of type at least, and can be limited to this by choosing all sequences The quantity will be called the man statistical rata for the system. C. E. SHAM UGH April 26, 194* Samples of Statistical English C B S^a**o* A number of samples of statistical English including probability structure out to four, words are given below. These were constructed by starting off with three words from a book. These three words are shown to someone who fits them in a reasonable English sentence and writes down the word following the three. The first word is then covered up and the process repeated with a different person, etc. If the imagined sentence ends after the added word, the person writing the word adds a period. For samples bearing a title the participants were told that this was the subject dealt with. These samples may be compared with those in "A Mathematical Theory of Communication" where less statistical structure is included. The samples given here were obtained for the most part, with the aid of J. R. Pierce, B. McMillan, C. C. Cutler and W. E. Mathews, A few of the samples were obtained from other sources (contemporary literature, etc.) and are included for comparison. The reader may try his skill at guessing which are statistically constructed. The true sources are given at the end. 1. This was the first. The second time it happened without his approval. Nevertheless it cannot be done. It could hardly have been the only living veteran of the foreign power had stated that never more could happen. Conse- quently people seldom try it. 2. John now disported a fine new hat. I paid plenty for the food. When cooked asparagus has a delicious flavor sug- gesting apples. If anyone wants my wife or any other physicist would not believe my own eyes. I would believe my own word. 3. That was a relief whenever you be let your mind go free who knows if that pork chop I took with my cup of tea after was quite good with the heat I couldn*t smell any- thing off it I T m sure that queer looking man in the 4. In a few days was the minimum amount of money remaining to the end. However everyone knows the meaning implied. It was true when Cutler says that we should proceed care- fully. When you love yourself too much., The woman who accosted 5. Fourscore and twenty years passed before we could meet them that isn't already done should have been a good son is going fast according to the teacher of his ability. His intelligence sufficed for the time. This cannot change much. - 2 - 6. Even the killing was atrociously perpretated by the cruelest treatment that a small boy jumped over the hedge and buried her. A grave fault of many approaches to the furthermost reaches of the state. Politics and business are becoming lost to the . 7. It is an Italian ox mouth dish. The only thing in the room is worms. I am the director of the seminar. In an evolving hemisphere. C'est Monsieur Jardin. I am a patient. Oh my dear Plapsen, you are my dearest Klapsen. He took it with many other matters are more apparent if they think so. Is there a reason for supposing that most people don't. Nevertheless sex is absolutely neces- sary as though the electron diffraction camera plate up on the top surface of 9. Fifteen years before the mast, he ever had eaten. Try it and see, I believe that whatever arises a fund has been accumulated sufficiently in the near future holds m« ™™ * * ■ • • ■ ... many surprises. No man can judge his actions by his wife Susie . 10. I forget whether he went on and on. Finally he stipulated that this must stop immediately after this. The last time I saw him when she lived. It "happened one frosty look of trees waving gracefully against the wall. You never can 11. When I bought my wife a long time ago. I knew that it wasn't faster when he didn't eat or drink a toast to John Doe, otherwise known as McMillan's theorem. Whatever the nature of Christ's teachings. Go far into 12. McMillan's Theorem McMillan's theorem states that whenever electrons diffuse in vacua. Conversely impurities of a cathode. No sub- stitution of variables in the equation relating these quantities. Functions relating hypergeometric series with confluent terms converging to limits uniformly expanding rationally to represent any function. 13 • House Cleaning First empty the furniture of the master bedroom and bath. Toilets are to be washed after polishing doorknobs the rest of the room. Washing windows semi-annually is to be taken by small aids such as husbands are prone to omit - 3 - 14. Epiminondas Epiminondas was one who was powerful especially on land and sea. He was the leader of great fleet maneuvers and open sea battles against Pelopidas but had been struck on the head during the second Punic war because of the wreck of an armored frigate. 15. Salaries Money isn't everything. However, we need considerably more incentive to produce efficiently. On the other hand too little and too late to suggest a raise v/ithout a reason for remuneration obviously less than they need although they really are extremely meager. 16. Murder Story When I killed her I stabbed Claude between his powerful jaws clamped cruelly together. Screaming loudly despite fatal consequences in the struggle for life began ebbing as he coughed hallowly spitting blood from his ears. Burial seemed unnecessary since further division was necessary. The sources are: 3, from "Ulysses" by James Joyce, page 748; 7 and 14 are the conversation and writings of two schizophrenic patients (quoted from Bleuler, "A Textbook of Psychiatry"). All others constructed by statistical means. „_C, ..-£,. -SHANNON "J une 11, 1 948 The Department of Defense H DEVELOPMENT Washington 25, D. C. Prepared by THE PANEL OF COMMUNICATIONS OF THE COMMITTEE ON ELECTRONICS Approved: Chairman 5. SIGNIFICANCE AND APPLICATION C. E. Shannon Bell Telephone Laboratories Murray Hill, N. T. 1. Introduction . A general communication system is shown in Figure 3. An information source produces a message. This is encoded in a transmitter to produce a signal suitable for transmission over the channel. During transmission the signal may be perturbed by noise. The perturbed signal is decoded or demodulated at the receiver to recover, as well as possible, the original message. The situation is roughly analogous to a transportation system for transporting physical goods from one point to another. We can imagine, for example, a lumber mill producing lumber at an average rate of R cubic feet per second and a conveyor system capable of transporting C cubic feet per second. If R is greater than C the full output of the mill cannot possibly be carried on the conveyor. On the other hand, if R is less than or equal to C it may or may not be possible, depending on whether the lumber can be efficiently packed in the available space of the conveyer. However, if we allow ourselves to saw the lumber up into suitable sizes and shapes we can always approach 100 per cent effi- ciency in packing. In this case we must, of course, supply a carpenter shop at the other end of the conveyor to reassemble the lumber in its original form before passing it on If the analogy is sound we might hope to define two parameters R and C associated with an information source and a channel, respectively. R should measure, in some sense, how much information is produced per second by the source, and C the capacity of the channel when used in the most efficient manner for transmitting information. We would expect then that if R ^ C the full output of the source cannot be transmitted satis- factorily. If R ^ C it should be possible to transmit the output of the source by proper encoding and decoding at transmitter and receiver. It turns out that it is possible to define quantities R and C which measure these information rates and capacities and satisfy the desired relationships. We will attempt to show how this can be done without, however, giving mathematical proofs of the results. 1 2. The Information Source . The first problem is that of clarifying the nature of "information" and finding a measure of the rate of production for an information source. Information involves basically the concept of "choice." An information source chooses one particular message from a set of possible messages. If there were only !For mathematical details, see Shannon, C.E., "A Mathematical Theory of Commu- nication," Bell System Technical Journal. July and October, 1948. See also Shannon, C .E . , "Communication in the Presence of Noise," Proceedings of the I.R.E . (Forthcoming). to the consumer. 14 one possible message there would be no communication problem. The amount of informa- tion produced by a source must evidently be related to the range of choice available. The simplest possible choice is a choice from two equally likely possibilities, say or 1. We shall call the corresponding unit of information a binary digit or "bit." A relay or flip-flop circuit has two possible states and is capable of storing one bit of information. A device which chooses at random from or 1 making one choice each second is considered to be producing information at rate R of one bit per second. Such a source produces a "message" which is a random sequence of O's and l's. A choice from say. 32 equally likely possibilities can be considered as a series of five choices, each from two equally likely possibilities, and, therefore, should correspond to five bits. More generally, a choice from n equally likely possibilities represent log P n bits. £ Suppose now that the various possible choices have different probabilities of occur- rence, say pi, p2, p n . How much information is produced when a choice is made under these circumstances? One feels intuitively that less "choice" is involved in a device which chooses between and 1 with probabilities .01 and .99 than in one which chooses with equal probabilities. In the former case the result is almost sure to be 1. The following example shows that by proper encoding an average compression can be obtained by using the probabilities pi, P2, p n . Suppose there are four possible choices A, B, C, D with probabilities p A = 1/2, p B = 1/4, p c = 1/8, p D = 1/8. If we use a simple direct code into binary digits: A = 00 B = 01 C = 10 D = 11, we use two binary digits per letter. On the other hand, using the following code where more probable letters are given short codes and less probable letters longer codes, we obtain an average saving A=0 B = 10 C = 110 D - 111. This is a reversible code; the original text can be recovered from the encoded sequences as is readily verified. With this code we need, on the average, only (1/2 x 1 + 1/4 x 2 + 1/8 x 3 + 1/8 x 3) = 1 3/4 binary digits per letter. We may say then that a choice with probabilities 1/2, 1/4, 1/8, 1/8 corresponds to 1 3/4 bits of information. If an information source were producing a sequence of the letters A, B, C, D with these probabilities we could encode it into a sequence of binary digits in which 1 3/4 binary digits are used on the average for e?.ch letter of message. A general analysis of the situation shows that if the letters are chosen with probabili- ties p lf p2, p n then it is possible to encode into binary digits using H = - 2, Pi log 2 Pi binary digits per letter of message on the average, and there is no method of reversible encoding using less. This H then is the equivalent number of bits per letter, and, if the source produces n letters per second, R = nH is the rate of production in bits per second. 16 In the case of English text the statistical structure is more involved. There are the mricms letter probabilities Pi , but, also, there are statistical influences between nearby totters For example, the letter T is more often followed by H than by any other letter a Qis almost invariably followed by U, etc. In such cases there is a more general formula i for calculating the equivalent number of bits per letter of message. Let pU, 3» ■ s)oe i Ibe probability in the language of the sequence of letters i, j s. Then we define G„ ft l: .V ; !i. m p(i, j, s) log 2 p(i, i, .... s) where the sum is over-all sequences of letters which are just n letters long J^h which ouences Gi. Go G n> ... represents a series of approximations to the desired H which takes into account mofe and more of the statistical structure as we proceed along the sequence. The information per letter of message can be defined by the limiting value of the G's. H = Lim G — » oo n It can be shown that H has the desired properties; namely, we can encode the messages from the source into binary digits using H binary digits per letter on the average, and no method of encoding uses less. For the English language H has been estimated at roughly 2 bits per letter, taking account only of the statistical structure out to about 6 or 8 letters. If the messages produced by the information source are continuous functions of time ta in speech or television transmission, the situation is much more involved and we will not discuss it in detail. It is still possible to assign a rate of production of information In bits per second to such a source, but the rate now depends on other considerations. With continuous functions as messages, exact reproduction is not generally required and the rate R depends on the amount and nature of the discrepancy which can be tolerated between the original and recovered messages. The tolerable discrepancy in turn is determined by the final destination of the messages. With speech, for example, the toler- able errors depend on the structure of the human ear and brain. Although the mathematical problems involved in defining the rate for a continuous source have been completely solved, it is in practical cases very difficult to estimate R. The following calculation may be of some interest, however. Suppose we are interested only in transmitting English speech (no music or other sounds), and the quality require- ments on reproduction are only that it be intelligible as to meaning. Personal accents, Inflections, etc., can be lost in the process of transmission. In such a case we could at least in principle, transmit by the following scheme. A device is constructed at the trans- mitter which prints the English text corresponding to the spoken words These can be ^ translated into binary digits in the ratio of about two binary digits per letter, or ^x4.D - v per word. Taking 100 words per minute as a reasonable talking speed we obtain 900 bits per minute or 15 bits per second as an estimate of the rate for English speech when in- telligibility is the only fidelity requirement. 3. The Capacity of a Channel . We now consider the problem of defining the capacity C of a channel for transmitting Information. Since we have measured the rate of production for an information source in 17 mitted over a given channel? in some cases the answer Is simple. With a . tele «»J%*£Z ^second, can send 5n bits per second. Suppose now that the channel is defined £ fc^j. JJ- ^ Vyclef pTrse^nfwide . tions of time f(t) which lie within a cer ^»^ a series of It is known that a function of thi^type can be J£j say that such a function equally spaced sampling points^ seconds apart Thus we may say has 2W degrees of freedom, or dimensions, per second. If there is no noise whatever » Even when there is noise, if we place no ^tjon s ^JgPSSS!SSU capacity will be infinite for we m **£W2?£tof e« p transmitter number of different amplitude levels .^^nw^etevres The capacity depends, of limitation. The shiest type o, noise is white V^tt'S^K''' distribution of ampUt^s is Ga**ta, and to a eetrnmr s ilat q 7 ^ tf into a unit resistance. The simplest limitation on transmitter power is ^^^S^£%M SLr«TL£T£K SLrto/eTarametLs W, P, and N, the capacity C can be calculated. It turns out to be C = W log 2 E -^ Ji (bits per second). P + N N different amplitudes at each sample point. In a time T there will be 2TW independent samples. Thus, there are approx imately ( / P + N ) 2TW (p + N)TW M " (V N ) = ( N ) different signal functions of duration T that can be distinguished from one another in spite of the noise. This corresponds to 18 log 2 M = TW log 2 P ft N binary digits in the time T or C=W log 2 P^N binary digits per second. This formula has a much deeper and more precise signifi- cance than the above argument would indicate. In fact it can be shown that it is possible, by properly choosing our signal functions, to transmit W log 2 fo^ binary digits per second with as small a frequency of errors as desired. It is not possible to transmit binary digits at any higher rate with an arbitrarily small frequency of errors. This means that the capacity is a sharply defined quantity in spite of the noise. These state- ments are proved by two different methods. * The formula for C applies for all values of P/N. Even when P/N is very small, the average noise power being much greater than the average transmitter power, it is pos- sible to transmit binary digits at the rate W log 2 P N with as small a frequency of errors as desired. In this case log 2 (1 +£) is approximated by -£log 2 e = 1.443 ^ and we have approximately C = 1.443 It should be emphasized that it is only possible to transmit at a rate C over a channel by properly encoding the information. In general, the rate C is only approached as a limit by using more and more complex encoding and longer and longer delays at both trans- mitter and receiver. In the white noise case the best encoding is such that the transmitted signals themselves have the structure of a white noise with power P. The difficulty with the approximate argument given for that case, and the reason it does not give a sharply defined capacity, is that the selection of signals is not optional. The distribution of ampli- tudes is not Gaussian as it should be. 4. Comparison of Ideal and Practical Systems . * In Figure 4 the curve is the function % = log (1 + f ) plotted against P/N measured in db. It represents, therefore, the channel capacity per unit of band with white noise. The circle and points correspond to PCM and PPM systems used to send a sequence of binary digits and adjusted to give about one error in 1CP binary digits. In the PCM case the number adjacent to a point represents the number of ampli- tude levels - 3 for example is a ternary PCM system. In all cases positive and negative amplitudes are used. The PPM systems are quantized with a discrete set of possible positions for the pulse, the spacing is ^j, and the number adjacent to a point is the num- ber of possible positions for a pulse. The series of points follows a curve of the same shape as the ideal but displaced horizontally about 8 db. This means that with more involved encoding or modulation sys- tems a gain of 8 db. in power could be achieved over the system indicated. See Shannon, C. E., "Mathematical Theory of Communication" and "Communication in the Presence of Noise." 20 Of course, as one attempts to approach the ideal, the transmitter and receiver re- quired become more complicated and the delays increase. For these reasons there will be some point where an economic balance is established between the various factors It may well be, however, that even at the present time more complex systems would be justified. A curious fact illustrating the general misanthropic behaviour of Nature is that at both extremes of P/N (when we are well outside the practic* ^/^pcMlotaS in Figure 4 approach more cjosely the ideal curve. At very large P/N * e ,f £M pomts Approach to within 10 log 10 # = 4.5 db. of the ideal while with very small P/N the PPM points approach to within 3 db. The relation C = W log (1 can be regarded as an exchange relation between the parameters W and P/N. Keeping the ch^el cgacity fixed we can'decrease the bandwidth W provided we ^ease P/N «£- ficiently. Conversely, an increase in band allows a lower signal-to-noise ratio in the channel The required P/N in db. is shown in Figure 5 as a function of the band W. It is assumed here that as we increase W, N increases proportionally: N = W N where N is the noise power per cycle of band. It will be noticed that if P/N is large a reduction of band is very expensive in power. Halving the band roughly doubles the signal-to-noise ratio in db. that is required. The channel capacity C can be calculated in many other cases. A general result that applies in any situation where the average transmitter power is limited to P is that the channel capacity is bounded by: WlogL^l^C £W log^ where N, is a parameter called the "entropy power" of the noise. It is defined as the power ina white noise having the same entropy as the actual noise. N is, as before, the average noise power. 21 22 REFERENCES Nyquist, H. "Certain Factors Affecting Telegraph Speed,' Bell System Technical Journal, April 1924, Hartley, R. V. L. Shannon, C. E. Toller, W. G. Wiener, N. Bailey, R. D., and Singleton, H. E. p. 324. "Certain Topics in Telegraph Transmission Theory," A.I.E.E. Transcripts, Vol.47, April 1928, p. 617. "Transmission of Information," Bell System Technical Journal , July 1928, p. 535. "A Mathematical Theory of Communication," Bell System Technical Journal, July, October, 1948. "Communication in the Presence of Noise," Proceedings of the I.R.E . (Forthcoming). Sc.D. Thesis, Department of Electrical Engineering, Massachusetts Institute of Technology, 1948. The Interpolation, Extrapolation and Smoothing of Stationary Time Series, NDRC Report (Forthcoming as a book to be published by John Wiley and Sons, Inc., New York). Cybernetics . John Wiley and Sons, Inc., New York, 1948. "Reducing Transmission Bandwidth," Electronics. August 1948, p. 107. 23 [Ml Note on Certain Transcendental Numbers Claude E. Shannon This note calls attention to a certain class of numbers that are easily shown to be transcendental but seem to have escaped previous notice. A typical example is the number -2 * X = 2 * or more precisely X = ^Lim^X n , ^ n +l = 2 * ^0 = 2 * ^ is ^ easily seen that X exists and satisfies the equation X = 2" . It is known from a conjecture of Hilbert , proved by Gelfond and by Schneider, that a x is transcendental if a / 0, 1 is algebraic and x is an algebraic irrational. Nov; X is clearly not rational, and if we suppose it an algebraic irrational, it must then be transcendental, a contradiction. Hence it is transcendental. More generally let f be a function such that if x is algebraic and does not belong to a set S, then f(x) is transcendental. Let g 1 and g 2 be algebraic functions and such that x f g 1 fg 2 x, xeS. Then the solutions of are transcendental by a similar argument , using the fact that g£ is algebraic. If the sequence X n = (g 1 fg 2 ) 1 X approaches a limit X it must be transcendental. Some functions known to have the property required for f are sin x, e x and J Q (x) , the exceptional set S consisting of the number 0. C. E . SHANNON October 27, 1948 \ '. A CASE OF EFTIC1EHT CGDI83 FOl A BOIST CHAH38L Consider a di aerate channel with two poeeiMe symbols and 1* Hoise it aeeuaec to affect successive cyrbolB inde- pendently **nd in such 6 wty that t o probability of a syjabol bainf, inter, reted correctly at the receiver ie j> » * g 1 wnlealg the probability of incorrect interpretation io q - ^ 2 ca^city of such & channel is - e 2 Ve e©»us» e very soall and epproximte log (1 ♦ c) by z 2 * e 2 (natural units) In bits .or ayebel, the capacity 1st C - log*, a A vary eiaple coda can be oonetruct<*J for this eyatea to aond a Doquence of random binary dibits at nearly the rata C with a quite snail frequency of errors | In other wards a code Wuich la not far fron the ideal* The code is merely to repeat each binary digit in the oeeeage a large number n of tiasee. At the roceiver, a group of n is received, end the rajority report la taken aa the original nessags eynbol. If the m&mrp eynhol is then a f s are trans-itted. At tilt receiver the n received eynbols will be a -istur© of 0*8 und l»a the number of 0*s present will be distributed ac- cording to a binonial distribution with p • I *, * and q ■ For large n the binonial distribution is approximately nornal (and this approximation is especially ^ood when p 5 s close to i). The exacted nc->*r of O'c is p n, and the standard devia- tion is; An error occu*e when the number of rocoivod O'o ie lose than l.e* when the actual number of cores is p n - § av*iy froo t;ie ejected nunber. In terras €>f r this iat *■ - ^ — ^ standard deviations. Hence the frequency of errors is given by the area of a noma! curve with otandard deviation equal to unity fron a out to m. To obtain a frequency of errors 10*3, say, we mist have a ■ 1*5 n t and the rate is -JL. as coopered with the rate 1«.&5 the 2.3 ideal (with essentially zero froquency of errors). Hovenber IS, c. s. svjjman December 6, 1943 Note on Reversing A Discrete Markhoff Process In "A Mathematical Theory of Communication" a language was represented by a discrete Markhoff process with a finite number of possible states. Such a stochastic process can be represented schematically by means of an oriented linear graph as in Fig. 1 Consider the question of generating the same language in reverse; for example, English but read backwards. Can we always invert a finite state Markhoff process and obtain a finite state Markhoff process? The answer is "yes" and further- more the corresponding linear graph has the same topology, but with reversed kwwl orientation on all branches. If the original process has,! probabilities /(probability when in state i of going to state j), then the reverse process has the same state probabilities and the transition probabilities given by: <yU) - g Hii) t This is true since this qj(i) is merely the a posteriori probability for the original process that when in state j the preceding state was state i. The inverse of Fig. 1 is shown in Fig. 2. It is interesting to show directly that the entropy H £ of the reverse process is equal to the entrop4jHp of the forward process. Of course, this must be true a posteriori from the general properties of entropy. V/e have Pjfi'jU) - PifKj) 9 ? - 2 - Hence t ZP^U) log Pjqj(i) - ZPifi(j) log Pl^i(j) or 2Pjqj(i) log qj(r) ♦ 2Pjqj(i) log ? ± - ZtjfiU) log ♦ ZPij^itj) log Pi Iff Hence: -H R + ZPj log Pj —Hp ♦ ZPi log Pi C. E« SHANNON 1 Outline of Talk American Statistical Society, December 28, 1949 INFORMATION THEORY by C. S. Shannon Bell Telephone Laboratories, Inc., Murray Hill, R. J. 1, Information Produced by a Stochastic Process In communication engineering , we are interested in transmitting messages from one point to another. The messages generally consist of a sequence of individual symbols, such as the letters of printed English, which are governed by proba- bilities. Thus, in English, there are the various letter fre- quencies, digram frequencies, etc. The "meaning* of the message (if any) is irrelevant to the engineering problem. Abstractly, then, we may consider a message to be a sequence of meaningless symbols produced by a suitable Stochastic process. Communication systems must be designed to handle the ensemble of possible messages; the particular one which will actually occur is not known when the system is constructed. The source producing messages is assumed to have only a finite number of possible internal states. 2. Entropy as a Measure of -Information A suitable measure of the amount of Information pro- duced by a discrete Stochastic process is given by the entropy H, where Ha- Um hi p^, lo*2 ** x l» ••"» ■ ™e> ^S» sw - 2 - in which x^, • Xjj is & sequence of N symbols produced by the process, p(x^ f •*#, x^) is the probability of this ssquence, and the sum is over all sequences of this length. The significance of the quantity H is that it is pos- sible to translate messages from a source with entropy H into a sequence of binary digits (0 or 1) using, on the average, H + c binary digits per letter of the original message with any positive c. It is not possible to translate so that fewer are used* Thus. B measures, in a sense, the equivalent number of binary digits per letter of message. It can be shown that H also determines the amount ef channel capacity required for transmission of the original messages. entropy, H x (y) , of one source relative to another. This measures in a sense the uncertainty per letter of the y sequence when the x sequence is known, or ths amount of additional infor- mation in the y sequence over that available in the x sequence. H x (y) can be defined as follows: Hjty) « H(x, y) - H(x) where H(x, y) is the entropy of the sequence whose elements are ths ordered pairs (x, y) • 3. The Nature of Information While the entropy H measures the amount of information produced by a Stochastic process, it does not define the infor- mation itself. Thus two entirely difference sources might produce information at the same rata (same H) but certainly they are not producing the same information. If we translate the output of a particular source into a different "language" by a reversible operation, the translation may be said to have the same information as the original. Thus we are led to consider the information of a Stochastic process as that which is common to all translations obtained from the given process by members of the group of reversible translations, or, alternatively, as the equivalence class of all processes obtains* from the given one by such translations. To avoid certain paradoxical situa- tions, involving infinite internal storage in the transducer doing the translating, it is desirable to first limit the group Q to translations possible in transducers having a finite number of possible internal states. The information associated with a process may bs denoted by a single letter, say X. Thus X = T means that T can be obtained by a translation of I, and conversely. It is possible to set up a metric satisfying the usual postulates as follows: * 2H(x, y) - *(x) - H(y) . Vith this metric It Is possible to define limiting sequences of elements, each of which is an information. Thus s Cauchy sequence, X jL> Xj, i« defined by requiring that Lim ptX,, In) « . The Introduction of these sequences as new elements (analogous to irrational numb ere) completes the space in a satisfactory way and enables one to simplify the statement of various results. k. The Information Lattice A relation of inclusion, x > y, between two infor- mation elements x and y can be defined by x > 7 * H x (y) ■ . This essentially requires that y can be obtained by a suitable finite state operation (or limit of such operations) on x. If x > y we call y an abstraction of x. If x > y, y > s, then x > s. If x > y, then H(x) > H(y). Also x > y means x > y, x f y. The information element, one of whose translations is the process which always produces the same symbol, is the element, and x > for any x. The sum of two Information elements, s m x + y, is the process which produces the ordered pairs (x^, y n ). We have and there is no u < s with the properties; a is the least upper bound of x and y. The product s » xy is defined as the largest t such that • > x, s > yj that is, there is no u > s haying both x and y as abstractions. The product is unique. With these definition* information element e fona a metric lattice. The lattice it not distributive, nor even modular. A non-distributive example 1b x, y independent sequences of binary digits, with z the sequence obtained by- mod 2 addition of corresponding symbols in x and y. Then sy + 2x = + = i(x + y) ■ i / . The lattices are relatively complimented. There exists for x < y a ■ with s + x = y sx =* . The element s is not, in general, unique. 5. The Delay Free Group 0^ The definition of equality for information based on the group allows x = y when y is, for example, s delayed version of x$ y B ■ x^. In some situations, when one must act on information at a certain time, a delay is not permis- sible. In such a case we may consider the more restricted group of instantaneously reversible translations. One may define inclusion, sum, product, etc., in an analogous way, and this also leads to a lattice but of mush greater complexity and with many different Invariants. Proof of an Integration Formula C. E. Shannon The integral sin 2 x 2 sin^ or has arisen in an acoustical problem. It has been evaluated for N = 1, 2, 3, 4 as equal to g N (a) = a N + 2 i — r- 1 sin 2 i a (2) (-1 ' by R. C. Jones, and he has conjectured that f N = g N for all a, Af. A general proof follows. From (1) we have . , . , „, . 1 f ° cos lNx-2 cos 2(W - 1)* + cos 2W - 2) x . A 2 *, -h ~ Tfn-1 + In -2 = ~ y J L ^T^ ^ and d a2 , , , cos 2Ate - 2 cos 2flV - l)a + cos2(A^ - 2)a — AW»(«) y^ (3) Also from (2) Aiv = a + 2 (-1 ' 2 _ sin 2(AT - 1) a AN. AT ftV(a) N~^\ tit.N gsw = 2 cos 2(N - 1) a (4) The equality of (3) and (4) can be established by noting that the numerator of (3), -2- Hence cos 2 N a - 2 cos 2(N - l)a + cos 2(N - 2)a Re [ e JV,a - 2e J2{N ~ l)a + e/W-2)aj Re ^-i)a[ c , 2 a_ 2 + c -,2 a ]J = Re |«W-D« (2; -)2 2j - - Re |4 sin 2 a ^W- 1 )*) = - 4 sin 2 a cos 2(N - l)a but A 2 (0) = A 2 f N (0) = 0, so that ^ 2 n,n8nM = Ai^/jvCot) also it has been verified that Si (°0 = /i(a) £2 (°0 = /2(a) Hence it follows in general that A &leit*l ****** »t fr^Mlttltac lafonttttoa 2t Is p*«*lM* fey ¥fe*l*u# of eodulaUoe to Xmr pjroto oao tutpmt of e oystos for *jr&»o*iUia£ Iafor»*Uoa at too OXpoooo Of otters. Mi« T*risro« car.atmeo *tic* mj se exoasuigfg i, uaitty of rocoivo* oigoel, ftiiica ess bo rou^iJ/ SMMMHtrwS la *««HM» t>/ S&0 tO £13 1 00 - ratio* £• TtttiiBZi 2 1%9? yc**r»p. S. tlm of troossUooi£A» ft. BoiOO 4*4 t&O OJKfeOtt* aoooroX tteojr* of bow tfeooo voriofcioo oro roiotoO «*4 tSm liivwi»«d oafi will oe &«volopo4 la a forthoofclas soaorwifim. Bo»oo«r «poofcitt& x-.Ht*M/ *&4 oa&or « sus&ber of o ojJUioay 0001*09- - f ol2ooXm« e^ufitioos a ■ f if y 10 {*) 3 * « aooouro Of 4ii*t0rtiGji at tftt **««tv*r t * *f trooonlooiaa * • bsaa iriiia ©f tro-ts&ittor ST * aciso j-«w«T £*30|t? fl ti:«t 1» t&O O&iOO ?OW*r p#r *Ait tw?.i4 oil Hi, *>*«&r*e» tolas alalia *s flfci is toe rofii«» u^At-? *fi>.:mlaar*tioa yj UUi ftmi tautt koojMtag rooolToft <|ooli*jr istojr&ottt oo aor 0100010 t, F «M £ 1a r*rio*»o o*> loo* ft* oo kooo t l* o gpam ©f t&« foooHoo* r 1 21 «fcoro £«* an£ % or« too WUl triuioatttor tatar ao4 acl«o QJQjSg f, **ria« too traaftftlsalast tiao. ^» fcr •sa«pl« t/jr to- oroosiog btutf wUto oo ooo eoorofioo tra&o&ittor - tU« m&a&m&t 10 la «a« ooaoo vor* foooroolo »iae* It lit « log- aritt.ai« *moj o**lag aulto or boaA oJUitfc AlvMoo t&o o*or«r »jf a ft* tor. »ro two »*tbfld« of fetter Sag o1&ao1 *» aaloo rotlo «t too ox»«ooo of boo* «i*to. BoltOor of titooo Jkwovo* Is by oor msw* eftUud l& too ozobooso. Sfco $roooal aoKomotoa toooriooo o sow ootfaoo at its t&Uft oosootlollr too aoxtwai e*oias of olgool pmm* io oofelovoi for o $lm oo** wlata laero*oo* &U 4coo not «oo£ toot «t« ftfotoa of tro ao a i o o ieo lo • tooorotioaHf Uool ono for tkoro oro oororol otHor aooo* of iss$*miM* ro- ooivoi qooJLU* fcooola* f . *. ? *o& * flxoi - «**t tfclo oro too to to yWlt m ooarlr tAool oireonago roto ootooo* too anlM 1m Oaa^L fift Um of O OOlloo fcfa* YOl&OC of too lopot ytoolotlag fomoUoa (too o$oooa faootloo la tolo- saoao oaa roftle) ot o 00300000 of rofolorXr ooboo* oooylla t Thus t«8 + 4~£**l , Oi *5 --« 4-4-2 + 1 A tnaaltttr for this ay* taa oould built 1m the following way. A oondenaar ia okarged as usual to tha eamplad roltage. fill roltaga la read on a comparator teiaaed up to ■ half the *w<""t If the comparator glrea a poaitlra Indlcatioa am electronic switch la oloaad feeding a aegatire pulaa of 2* uuita oT charga late tha condenser; If not a poaitlra pulaa of 2 m unita is fad in. Tha oomparator is now switched to control ' - at now pulaa source whieh preduaas pulaaa of 2 n ** 1 units and tha prooaaa is repeated. Thus tha circuit f aods in positire or nogatlTO pulaaa of decreasing magnituda "hunting* for a balance. At oaoh stags a rooordar remembers whathor a poaitlra or negatire pulaa was used. Thass positire ant nagatira recordings actually arc tha Binary roprasantation of tha original roltaga, as ona can soo »y roading tha shore table with 1» roplaaod by 0. Baneo tha raoolror of Jig, 4 can ho used without alteration in this system* - £723 Creative Thinking f Up to 100% of the amount of ideas produced, useful good ideas produced by these signals, these are supposed to be arranged in order of increasing ability. At producing ideas, we find a curve something like this. Consider the number of curves produced here - going up to enormous height here, A very small percentage of the population produces the greatest proportion of the important ideas. This is akin to an idea presented by an English mathematician, Turig, that the human brain is something like a piece of uranium. The human brain, if it is below the critical lap and you shoot one neutron into it, additional more would be produced by impact. It leads to an ex- tremely explosive • of the issue, increase the size of the uranium. Turig says this is something like ideas in the human brain. There are some people if you shoot one idea into the brain, * you will get a half an idea out. There are other people who are beyond this point at which they produce two ideas for each idea sent in. Those are the people beyond the knee of the curve. I don't want to sound egotistical here, I don't think that I am beyond the knee of this curve and I don't know anyone who is. I do know some peopie that were. I think, for example, that anyone will agree that Isaac Newton would be well on the top of this curve. When you think that at the age of 25 he had produced enough ■ science, physics and mathematics to make 10 or 20 men famous - he produced binomial theorem, differential and integral calculus, laws of gravitation, laws of motion, decomposition of white light, and so on. Now what is it that shoots one up to this - 2 - part of the curve? What are the basic requirements? I think we could set down three things that are fairly necessary for scien- tific research or for any sort of inventing or mathematics or physics or anything along that line. I don't think a person can get along without any one of these three. The first one is obvious - training and experience, lou don't expect a lawyer, however bright he may be, to give you a new theory of physics these days or mathematics or engineering. The second thing is a certain amount of intelligence or you have talent. In other words, /to have an IQ that is fairly high to do good research work. I don't think that there is any good engineer or scientist that can get along on an IQ of 100, which is the average for human beings. In other words, he has to have an IQ higher than that. Everyone in this room is considerably above that. This, we might say, is a matter of environment; intelligence ie a matter of heredity. Those two I don't think are sufficient. I think there is a third constituent here, a third component which is the one that makes an Einstein or an Isaac Newton. For want of a better word, we will call it motivation. In other words, you have to have some kind of a drive, some kind of a desire to find out the answer, a desire to find out what makes things tick. If you don't have that, you may have all the training and intelligence in the world, you don't have questions and you won't just find answers. This is a hard thing to put your finger on. It is a matter of temperament 3 - probably; that is, a matter of probably early training, early child- hood experiences, whether you will motivate in the direction of scien- tific research. I think that at a superficial level, it is blended use of several things. This is not any attempt at a deep analysis at all, but my feeling is that a good scientist has a great deal of what we can call curiosity. I won't go any deeper into it than that. He wants to know the answers. He's just curious how things tick and he he wants to know the answers to questions; and if/sees things, he wants to raise questions and he wants to know the answers to those Then there's the idea of dissatisfaction. By this I don't mean a pessimistic dissatisfaction of the world - we don't like the way things are - I mean a constructive dissatisfaction. The idea could be expressed in the words, "This is OK, but I think things could be done better. I think there is a neater way to do this. I think things could be improved a little. w In other words, there is con- tinually a slight irritation when things don't look quite right} and I think that dissatisfaction in present days is a key driving force in good scientists. And another thing I'd put down here is the pleasure in see- ing net results or methods of arriving at results needed, designs of engineers, equipment, and so on. I get a big bang myself out of proving a theorem. If I've been trying to prove a mathematical theorem for a week or so and I finally find the solution, I get a big bang out of it. And I get a big kick out of seeing a clever way of doing some engineering problem, a clever design for a circuit which uses a very small amount of equipment and gets apparently a great deal of result out of it. I think so far as motivation is concerned, it is maybe a little like Fats Waller said about swing music - either you got it or ii you ain't. If you ain't got it, you probably shouldn't be doing re- search work if you don't want to know that kind of answer. Although people without this kind of motivation might be very successful in other fields, the research man should probably have an extremely strong drive to want to find out the answers, so strong a drive that he doesn't care whether it is 5 o'clock - he is willing to work all night to find out the answers and all weekend if necessary. Well now, this is all well and good, but supposing a person has these three properties to a sufficient extent to be useful, are there any tricks, any gimmicks that he can apply to thinking that will actually aid in creative work, in getting the answers in research work, in gen- eral, in finding answers to problems? I think there are, and I think they can be catalogued to a certain extent. You can make quite a list of them and I think they would be very useful if one did that, so I am going to give a few of them which I have thought up or which peo- ple have suggested to me. And I think if one consciously applied these to various problems you had to solve, in many cases you'd find solutions quicker than you would normally or in cases where you might not find it at all. I think that good research workers apply these things unconsciously; that is, they do these things automatically and if they were brought forth into the conscious thinking that here's a situation where I would try this method of approach that would probably get there faster, although I can't document this state- ment. The first one that I might speak of is the idea of sim- plification. Suppose that you are given a problem to solve, I don't care what kind of a problem - a machine to design, or a physical theory to develop, or a mathematical theorem to prove, or some- thing of that kind - probably a very powerful approach to this is to attempt to eliminate everything from the problem except the essentials; that is, cut it down to size. Almost every problem that you come across is befuddled with all kinds of extraneous data of one sort or another; and if you can bring this problem down into the main issues, you can see more clearly what you're trying to do and perhaps find a solution. Now, in so doing, you may have stripped away the problem that you're after. You may have simplified it to a point that it doesn't even resemble the problem that you started with; but very often if you can solve this simple problem, you can add refinements to the solution of this until you get back to the solution of the one you started with. A very similar device is seeking similar known problems, I think I could illustrate this schematically in this way. Tou T s have a problem here and there is a solution which you do not know yet perhaps over here. If you have experience in the field repre- sented, that you are working in, you may perhaps know of a somewhat similar problem, call it P' , which has already been solved and which has a solution, S'. All you need to do - all you may have to do is to find the analogy from P' here to P and the same analogy from S' to S in order to get back to the solution of the given prob- lem. This is the reason why experience in a field is so important that if you are experienced in a field, you will know thousands of problems that have been solved. Tour mental matrix will be filled with P's and S's unconnected here and you can find one which is tolerably close to the P that you are trying to solve and go over to the corresponding S' in order to go back to the S you're after. It seems to be much easier to make two small jumps than the one big jump in any kind of mental thinking. Another approach for a given problem is to try to restate it in just as many different forms as you can. Change the words. Change the viewpoint. Look at it from every possible angle. After you've done that, you can try to look at it from several angles at the same time and perhaps you can get an insight into the real basic issues of the problem, so that you can correlate the important fac- tors and come out with the solution. It's difficult really to do this, but it is important that you do. If you don't, it is very easy to get into ruts of mental thinking. Tou start with a problem here and you go around a circle here and if you could only get over to this point, perhaps you would see your way clear; but you can't break loose from certain mental blocks which are holding you in certain ways of looking at a problem. That is the reason why very frequently someone who is quite green to a problem will sometimes come in and look at it and find the solution like that, while you have been laboring for months over it. You've got set into some ruts here of mental thinking and someone else comes in and sees it from a fresh viewpoint. Another mental gimmick for aid in research work, I think, is the idea of generalization. This is very powerful in mathemati- cal research. The typical mathematical theory developed in the fol- lowing way to prove a very isolated, special result, particular theo- rem - someone always will come along and start generalizing it. He will leave it where it was in two dimensions before he will do it in N dimensions! or if it was in some kind of algebra, he will work in a general algebraic field; if it was in the field of real numbers, he will change it to a general algebraic field or something of that sort. This is actually quite easy to do if you only remember to do it. If the minute you've found an answer to something, the next thing to do is to ask yourself if you can generalize this any more - can I make the same, make a broader statement which includes more - there, I think, in terms of engineering, the same thing should be kept in mind. As you see, if somebody comes along with a clever way of doing some- thing, one should ask oneself "Can I apply the same principle in more general ways? Can I use this same clever idea represented here to solve a larger class of problems? Is there any place else that I can use this particular thing?" Next one I might mention is the idea of structural analysis of a problem. Supposing you have your problem here and a solution - 6 - here. You may have too big a jump to take. What you can try to do is to break down that jump into a large number of small jumps. If this were a set of mathematical axioms and this were a theorem or conclusion that you were trying to prove, it might be too much for me to try to prove this thing in one fell swoopo But perhaps I can visualize a number of subsidiary theorems or propositions such that if I could prove those, in turn I would eventually arrive at this solution. In other words, I set up some path through this domain with a set of subsidiary solutions, 1, 2, 3» 4, and so on, and attempt to prove this on the basis of that and then this on the basis of these which I have proved until eventually I arrive at the path S. Many proofs in mathematics have been actually found by extremely roundabout processes. A man starts to prove this theorem and he finds that he wanders all over the map. He starts off and proves a good many results which don't seem to be leading anywhere and then eventually ends up by the back door on the solution of the given problem} and very often when that's done, when you've found your solution, it may be very easy to simplify; that is, to see at one stage that you may have short-cutted across here and you could see that you might have short-cutted across there. The same thing is true in design work. If you can design a way of doing something which is obviously clumsy and cumbersome, uses too much equipment; but after you've really got something you can get a grip on, some- thing you can hang on to, you can start cutting out components and seeing some parts were really superfluous. Tou really didn't need them in the first place. 9 - Now one other thing I would like to bring out which I run across quite frequently in mathematical work is the idea of inversion of the problem. You are trying to obtain the solution S on the basis of the premises P and then you can»t do it. Well, turn the problem over supposing that S were the given proposition, the given axioms, or the given numbers in the problem and what you are trying to obtain is P. Just imagine that that were the case. i Then you will find that it is relatively easy to solve the problem in that direction. Tou find a fairly direct route. If so, it's often possible to invert it in small batches. In other words, you've got a path marked out here - there you got relays you sent this way. You can see how to invert these things in small stages and perhaps three or four only difficult steps in the proof. Now I think the same thing can happen in design work. Sometimes I have had the experience of designing computing machines of various sorts in which I wanted to compute certain numbers out of certain given quantities. This happened to be a machine that played the game of nim and it turned out that it seemed to be quite diffi- cult. It took quite a number of relays to do this particular calcu- lation although it could be done. But then I got the idea that if I inverted the problem, it would have been very easy to do - if the given and required results had been interchanged; and that idea led to a way of doing it which was far simpler than the first design. The way of doing it was doing it by feedback; that is, you start with the required result and run it back until - run it through its value i ! 10 until it matches the given input. So the machine itself was worked backward putting range S over the numbers until it had the number that you actually had and, at that point, until it reached the num- ber such that P shows you the correct way. Well, now the solution for this philosophy which is probably very boring to most of you. I*d like now to show you this machine which I brought along and go into one or two of the problems which were connected with the design of that because I think they illustrate some of these things I've been talking about. In order to see this, you 1 11 have to come up around it; so, I wonder whether you will all come up around the table now. Bell Telephone Laboratories incorporated Cover Sheet for Technical Memorandum subject The Relay Circuit Analyzer - Case 22103 COPIES TO: CASE FILE DATE FILE AREA CENTRAL FILES (4) i - Patent Dept. (2) 2- R0 Bown 3 - Wo Ho Doherty 4 - Ho Ho Abbott 5- A0 0. Adam 6 -Ao E, Anderson 7 -Eo Go Andrews 8 ~ Mo Mo Atalla 9 - Ho Wo Bode 10 - Co Breen 11 = Co Eo Brooks 12 - Eo Bruce 13 - Ao Burkett 14 = Ao Jo Busch 15 - Ro Lo Carmichael 16 - Ao Bo Clark 17 - Co Clos 18 - Ro Co Davis 19 - Jo Wo Dehn 20 - To Co Dimond 21 - Ko So Dunlap 22 - F. So Entz 23 - Jo Ho Felker 24 - Jo Go Ferguson 25 - Eo Bo Ferrell 26 - Go Eo Fessler 27 -Wo 0o Fleckenstein 28 - Jo Bo Fisk 29 - Go Ro .Frost 30 - To Co Fry 31 -Eo No Gilbert 32 - Go Wo Gilman 33 -Ko Goldschmidt 34 -Ro Eo Hersey 35 - Bo D„ Holbrook 36 -Ao Wo Horton s Jr 6 37 - Lo Wo Hussey 33 -P. Husta 39 - Ao Eo Joel, Jr„ 40 - Mo Karnaugh MM~53~1400~9 mm- 53=1800=17 date March 31, 1953 author Co Eo Shannon Eo Fo Moore FILING SUBJECT (TO BE ASSIGNED BY AUTHOR) Switching Theory 41=Ao Co Keller 42=Wo Keister 43 - Go Vo King 44- Fo Ao Korn 45- Wo Jo Laggy 46=Co Yo Lee 47=Eo Co Lee 4S=Wo Do Lewis 49-Co Ao Lovell 50=Fo Ko Low 51- Ao Ao Lundstrom 52- Mo Eo Malonev 53- C. Ho McCandless 54- Bo McKim 55=Bo McMillan 56-Bo McWhan 57=G Ho Mealy 53= Jo t>G-Po 6l=Eo 62=0o 63~Oo 6 65- No 66- G. 6?=Wo 68-Ao 69=Ro 70= Co 71=Jo 72- R. 73- Ho 74- Co 75- H. 76»Fo 77-Fo 76- Bo 79-Lo 6*0=R o Sl=Eo 82=Fo S3- Jo 64- So S5-Eo S6=Ao 87-W. 6S=X o 39-P. Meszar Go Miller Mitchell Fo Moore Jo Murphy Myers Bo Myers Do Newby Ao Pullis To Rea Eo Ritchie Wo Roberts Rosenthal P o Runyon Mo Ryder No Seckler Eo Shannon So Shapiro F. Shipley Jo Singer Slepian Jo Stacy Eo Staehler Eo Sumner Wo Tatum Go Tryon H„ Washburn Fo Watson Weaver Fnitney Go Wilson Lo Wright (See next page for Abstract) MM- 52 -1400-9 M- 53 -1300-17 March 31, 1953 AESTRACT This memorandum describes a machine (made of relays, selector switches, gas diodes, and germanium diodes) for analyzing several properties of any combinational relay circuit which uses four relays or fewer. This machine, called the relay circuit analyzer, contains an array of switches on which the specifications that the circuit is expected to satisfy can be indicated, as well as a plugboard on which the relay circuit to be analyzed can be set up. The analyzer can (l) verify whether the circuit satisfies the specifications, (2) make certain kinds of attempts to reduce the number of contacts used, and also UJ perform rigorous mathematical proofs which give lower bounds for the numbers and types of contacts required to satisfy given specifications. The Relay Circuit Analyzer - Case 22103 MM- 53 -11-00-9 M^-53-1300-17 March 31, 1953 MEMORANDUM FOR FILE 1. Introduction Some operations which assist in the design of relay circuits or other types of switching circuits can be described in very simple form, and machines can be constructed which per- form them more quickly and more accurately than a human being can. It seems possible that machines of this type will be use- ful to those whose work involves the design of such circuits. This is the first of two memoranda describing particular mach- ines of this kind which have been built. The present machine, called the relay circuit analyzer, is intended for use in connection with the design of two terminal circuits made up of contacts on at most four relays The principles upon which this machine are based are not limited to two terminal networks or to four relays, although an enlarged machine would require more time to operate. Each addition of one relay to the circuits considered would approxi- mately double the size of the machine and quadruple the length of time required for its operation. ; This type of machine is not applicable to sequential circuits, however, so it will be of use only in connection with parts of the relay circuits which contain contacts, but no relay C011S a 2. Operation of the Machine The machine, as can be seen from Photograph 196492, contains sixteen 3-position switches, which are used to specify the requirements of the circuit. One switch corresponds to each of the 2^*16 states in which the four relays can be put. Switch No. 2 in the upper righthand corner, for instance, is labeled W + X + Y» + Z, which corresponds to the state of the circuit in which the relays labeled W, X, and Z are operated, and the relay labeled Y is released. The three positions of this switch correspond to the requirements which can be imposed on the condition of the cir- cuit when the relays are in the corresponding state. Since any- single relay contact circuit assumes only one of two values (open or closed), the inclusion of a third value (doesn't matter, don't care, or vacuous, as it has been called by various per- sons) merits some explanation. If the machine, of which the relay circuit being designed is to be a part, only permits these relays to take on a fraction of the 2 n combinations of which n relays are capable, then (except when considering what the mach- ine will do in case of relay failures) any circuits which agree on the combinations actually assumed will be equivalent in their properties. Since the class of circuits which agree with what is wanted just in the necessary combinations is larger than the class of those which agree in all combinations, the former class can and frequently will contain members using fewer con- tacts. Hence the switch corresponding to each state is put into the don't care position if the circuit will never assume that state, or if for any other reason the behavior when in that state is immaterial. The sixteen 3-position switches thus permit the user not only to require the circuit under consid- eration to have exactly some particular hindrance function, but also allow the machine more freedom in the cases where the cir- cuit need not be specified completely. In order to make a machine of this type to deal with n relays, (this particular machine was made for the case n - 4) 2 n such switches would be required, corresponding to the 2 n states n relays can assume. In each of these states the circuit can be either open or closed, so there are 22* 1 functionally distinct circuits. But since each switch has 3 positions, there are 3 2 distinct circuit requirements spec- ifiable on the switches, which in the case n = 4 amounts to 43,046,721. Thus, the number of problems which the analyzer must deal with is quite large, even in the case of only four The left half of the front panel of the machine (See Photograph No. 196492) is a plugboard on which the circuit be- ing analyzed can be represented. There are three transfers from each of the four relays, W, X, Y, and Z brought out to jacks on this panel, and two plugs representing the terminals of the network are at the top and bottom. Using these, as well as some patch cords, it is possible to plug up any cir- cuit using at most three transfers on each of the four relays. This number of contacts is sufficient to give a circuit repre- senting any switching function of four variables. nn + ha „. If the specifications for the circuit have been put on th« sixteen switches, and if the circuit has been put on oplratef ^ ' ^ CirCUit anal ^ er is then ready to care ^t^il^ t ^ 6 co ^ tro1 switch and the evaluate -com- pare switch both m the evaluate position, pressing the start button will cause the analyzer to evaluate the circuit plugged Ii^Ia k* ?° i ndlcate in which of the states the circuit is closed by lighting up the corresponding indicator lamps. nrtC1 .. . Turning the evaluate-compare switch to compare ^tll° n ^l h fu analyzer then checks whether the cir cuit di s- tfZttJUZ ? the requirements given on the switches. A dis- ?hl 1 indicated by lighting the lamp corresponding to actual Mr^? UeStion ' - If t Switch is set for cl0 ^ ed a " d the actual circuit is open m that state, or vice versa a dis- agreement is indicated, but no disagreement is ever 'registered S^SS? eJSdJ&E the ^ to the short test position and the start button is pressed again clrcSS^d^T de J enBiB S 8 Whether any of contaclfin this ' sa?iafVin^2o haVe ^ 6en shorted ou t, with the circuit still bestdf 7 ^! th V e 5 ulr ements. The machine indicates on the lamps beside the contacts which ones have this property. ever need tht «! a SUrpr i sing to the reader than anyone would rlniVkl the assistance of a machine to find a contact which is certlin?v r tru e °^ £ th ? Ut affe ? ting «»■ circuit, Wni?e t£is eulf! ™5r LS r S ° f s i m P le examples, in more complicated cir- ticSLSv \ f ed iJ2 dant elements are often far . from obvious, pa?- in S« iLif th6re Sre Some states for which the switches are in the don't care position, since the simplified circuit mav be onff f 8 ° nly u n tlie do " t care state. It is often quite diffi- cult to see the simplification in these cases. in„ fln3 i P 6 ana iy?! r is als o helpful in case the circuit be- tn g i-^- yZ6d l S abrid P> because of the complications involved P?^2 e i n f ° Ut a11 paths , in the bridge ' The^circuit shown in iJf???M.n S T an / Xampl ! ° f a , circui t which was not known to be inefficiently designed until put on the analyzer. It determined in less than two minutes (including the time^required to pW not S 1,0 ? 1 ! 1 ?* the P^osird) that one of the contacts shown can be shorted out. How likely would a human being be to solve this same problem in the same length of time? if . After the short test has been performed, putting the^main control switch in the open test position permits the analyzer to perform another analogous test, this time open- ing the contacts one at a time. These two particular types of circuit changes were chosen because they are easy to carry out, and whenever suc- cess! ul, either one reduces the number of contacts required, inere are other types of circuit simplification which it might be desirable to have a machine perform, including various rearrangements of the circuit. These would have required more time as well as more equipment to perform, but would probably have caused the machine to be more frequently suc- cessful in simplifying the circuit. Using such techniques, it might be possible to build a machine which could design circuits efficiently starting from basic principles, perhaps by starting with a complete Boolean expansion for the desired function and simplifying it step by step. Such a machine would be rather slow (unless it were built to operate at electronic speeds, and perhaps even in this case), and not enough planning has been done to know whether such a machine is practically feasible, but the fact that such a machine is theoretically possible is certainly of interest, whether any- one builds one or not. Another question of theoretical interest is whether a logical machine could be built which could design an im- proved version of itself, or perhaps build some machine whose over-all purpose was more complicated than its own. There seems to be no logical contradiction involved in such a mach- ine, although it will require great advances in the general undertaken aUt ° mata before any such P ro J* ect °o uld ^ confidently • To return to the relay circuit analyzer, a final operation which it performs is done with the main control switch in the prove position. Pressing the start button and moving the other 4-position switch successively through the W, X. Y, and Z positions, then certain of the eight lamps W, W[ , X, X', Y , I*-, Z, Z« will light up. The analyzer has carried out a proof as to which kinds of contacts are required to synthesize the function using the method of reduction to functions of one variable, which will be explained in a forth- coming memorandum. The analyzer here ignores whatever circuit has been plugged in the plugboard, and considers only the func- tion specified by the sixteen 3-position switches. If every circuit which satisfies these specifications requires a back contact on the W relay, the W» light will go on, etc. - 5 - If, for instance, seven of the eight lights are on, any circuit for the function requires at least seven contacts, and if there is in fact a circuit which uses just seven, the machine has, in effect, given a complete proof that this cir- cuit is minimal. Circuits for which the machine can give such a complete proof are fairly common, although there are also circuits (which can be shown to be minimal by more subtle me- thods of proof) which this machine could not prove minimal. An example is the circuit of Figure 1. This can be simpli- fied by the analyzer to a circuit of nine contacts, but in the prove position the analyzer merely indicates that at least eight contacts are necessary. It can be shown by other meth-i ods that the 9-contact circuit is minimal. But at any rate, the analyzer always gives a mathematically rigorous lower bound for the number of contacts. 3» The Circuit and Operation of the Relay Circuit Analyzer A complete circuit diagram of the analyzer is shown in Figures 2 and 3. The circuit, as already mentioned, has five modes of operation; 1. evaluating a circuit, 2. com- paring a circuit with desired characteristics, 3. examining a circuit for contacts that can be shorted without affecting operation, 4. examining for contacts that can be opened with- out affecting operation, and 5. proving that certain con- tacts are necessary in any realization of the function. The method of operation of the circuit will be described in turn for each of these five modes of behavior. 4. Evaluation of a Circuit • In this mode of operation the machine goes through in sequence the sixteen possible states of the relays W, X, Y and Z, that are involved in the circuit and tests in each state whether or not the circuit is closed. If it is closed, the corresponding panel light is lit. In this process only the right-hand part of the circuit in Figure 2 is involved and switches SIS and S19 are both in the evaluate position. The selector switch S17 goes through one complete revolution to make this test. During this revolution the four relays W, X, Y, and Z proceed sequentially through their sixteen states. This sequence is produced by the first two wipers and decks of the selector switch S17. At the first position (0000) all four relays are unoperated. At the second step (0001), ground on the second wiper operates relay Z, which locks in on its own front contact. The circuit is then set to test the situation where W, X and Y are unoperated and Z is oper- ated. At the third step relay Y is operated and locks in on - 6 - its own front contact. At the fourth step Z is short-circuited by the wiper of the first deck. This releases Z and produces the state 0010. Proceeding in this manner it will be seen that the four relays W, X, Y and Z go through the sixteen states indicated. The circuit which is being tested may be thought of as being connected between plugs PI and P2 at the upper left of the diagram. This network consists of contacts on the four relays W, X, Y and Z. Actually some other contacts are involved in the network between PI and P2 (contacts on the H relays) but in the present mode of operation these H relays do not operate and do not affect the hindrance from PI to P2. For a given state of the relays W, X, Y and Z the plugs PI and P2 will be connected together if, and only if, the circuit being tested is closed for that state of the re- lays. The relay G will, therefore, operate if, and only if, the circuit is closed in the state in question. If it is closed, a ground will be applied to the third wiper of the selector switch S17 and this will fire the corresponding neon lamp. If it is not closed +34 volts will be applied to the lamp extinguishing it (if it is already fired). The voltage across the lamp circuit, 64-24 or about 60 volts, lies between the fire and sustain voltages for the neon lamps. Consequently, if they are fired they will remain fired, if extinguished they will remain out. Thus the lamps remain in the state produced by the evaluation of the cir- cuit even after the wiper has left the point in question. The movement of the stepping switch is produced by a three-stage buzzer circuit consisting of relays U, V and P. In the buzzing condition the parallel S» and T» combination in series with U will be closed. The operation of U ener- gizes V through the front U contact in series with the V coil. The operation of V then operates P in a similar manner. The operation of P releases U through the P' contact. This releases V which releases P. etc. At the start of an evaluation, switch SIS will be in the evaluate position, switch S19 in the evaluate position, selector switch S17 at position 22 (and relay S, therefore, operated) and selector switch S16 at position 21 (with relay T, therefore, operated). When the starting push button S20 is pressed magnet Ml of stepping switch 1 is energized. When S20 is released Ml releases and the stepping switch moves to position one. This releases relay S and the three-stage buzzer U, V, P starts operating. At each cycle of this buz- zer the coil of selector switch S17 is energized and released by a make contact on the P relay. This sequences the relays W, X, Y and Z through their sixteen states , as already des- cribed, and indicates on the neon lamps the states for which the circuit being tested is closed. When the wipers reach level 22 relay S operates, stopping the buzzer and ending the test. - 7 - 5 . The Comparison Mode of Operation In this mode of operation the circuit set up on the plugboard is to be compared with the settings of the sixteen three-position switches. If in any state the circuit disagrees with the switch setting the corresponding neon lamp will light up. For this test switch S18 is set in the evaluate position and switch S19 in the compare position. When the starting push button S20 is pressed, the buzzing circuit U, V, P starts as before, cycling the selector switch S17 through one complete revolution. The four relays, as before, go through their six- teen possible states and the relay G, as before .operates or not, depending on whether the circuit being tested is closed or not. The lamps, however, are no longer controlled directly by the relay G, but instead by contacts on the relay A. The relay A is connected to operate, if, and only if, the circuit condition of the network being tested (open or closed) dis- agrees with the setting of the corresponding three-position switch. This result is obtained by having one end of the coil 2f,,? elay A connected (via the fourth wiper of selector switch S17J to +24 volts, nothing (i.e. floating) or minus, according to the desired behavior of the circuit in the state in question is open, "don't care", or closed (as represented by the setting of the three-position switch). The other end of the relay A is connected to +24 volts or minus, according as the actual circuit under test is open or closed (this being carried out by a transfer on the G relay). The relay A will operate only if the two ends of the coil receive different polarities, and this will occur only if the switch setting differs from the state of the network under test as indicated by the state of the relay G. If such a disagreement occurs the corresponding lamp is fired by a ground coming in the third wiper of selec- tor switch S17. The starting and stopping are carried out by the same means as used in the evaluate mode. 6. The Short Test In testing for contacts in the circuit that can be shorted, the sequencing is somewhat more involved. Roughly speaking, the various contacts used in the circuit are short- circuited one-by-one, and for each contact the circuit goes through a sequence similar to the comparing mode of behavior just described (comparing the circuit when this contact is shorted with the desired characteristics set up on the three- position switches). If any disagreement is found, the neon lamp associated with the contact in .question is fired, indi- cating that this contact is necessary in the circuit and cannot - 8 - be shorted. Actually, the sequence is a bit more complicated since to save time and equipment the tests on the make and break parts of a transfer in the circuit being tested are interleaved. To carry out the short test switch S16 is put in the short position (the position of S19 is irrelevant). The selector switches S16 and S17 start in positions 21 and 22 respectively, so that relays 3 and T are both operated. When the starting button S20 is pressed, the magnets of both S16 and S17 are energized and when S20 is released they step ahead one step releasing both S and T and allowing the buzzer circuit to start. The first step of selector switch S16 causes E to operate. This removes the voltage from the in- dicating lamps L16 to L39 (removing any indication on these lamps from previous runs). Stepper 1 then proceeds through a complete revolution. At step 17 the second wiper applies a voltage to the coil of Sl6, pulsing S16 ahead one notch. This releases E, and reapplies voltage to the indicating lamps Lib to L39. The wipers of selector switch S16 are now connected to position 1 (the top row) of this selector. The sixth wiper operates relay HI which disconnects the first W transfer from the circuit being tested. The three points in the circuit being tested that were previously connected to this transfer (on the W relay) are brought down to points rl, P5 and P7, P5 coming through the third wiper. The free ends of the W transfer, that are now disconnected from the circuit being tested are brought down via wipers 2 and 4. To test whether either part of this transfer can be shorted, the selector switch S17 goes through a complete cycle, putting the relays W, X, Y and Z in each possible state as in prev- ious modes of operation. In each state, the first test is to short P3 to P5, which in effect shorts the nodes of the circuit normally connected to the W part of the contact, and the circuit state is compared with the desired specification on the three-position switch, A disagreement operates relay A which, by way of wiper 1, fires the lamp corresponding to the W contact. This shorting of the nodes occurs in the buz- zer cycle during the period when the relay U is operated. The A contact is connected to the corresponding lamp through contact V and P' in series. This gives relay A time to oper- ate (or release from a previous operation) before its reading is applied to the lamp, and also disconnects the lamp before the state of A is changed by the next operation. The second test in the same buzzing cycle is to short the break contact of the transfer. This occurs when U releases, connecting P3 to P4 and P5 to P7. The W make is then connected as usual in the circuit being tested (via the Hx make, U» and wiper 2) and the nodes previously connected to the back W» contact are shorted via the 3rd wiper of sel- ector switch S16. In this part of the buzzing cycle the dis- agreement relay is connected via P and V» contacts (for timing margins similar to P» and V before) and the 5th wiper, to the lamp corresponding to the W' or break contact. This lamp will fire, as before, if a disagreement occurs indicating that the contact is necessary. After selector switch S17 has run through all states { rows 1 to 16) it applies ground through wiper 2 to the magnet of selector switch S16, advancing it one step. The machine now applies the shorting test to the X and X» contacts connected to the second row of selector switch S16. Proceeding in this manner it tests all the contacts. On reaching row 13, the 6th wiper of selector S16 applies ground to its own coil through its own back contact. This causes it to step rapidly through the remaining positions until it reaches row 21 where it oper- ates relay T. The first selector switch is meanwhile still Deing pulsed by the buzzer circuit. After T operates, the first time S17 reaches row 22, relay S operates and the buz- zer stops. This completes the test. i <- ? f is des i r ed to hurry the machine through the latter part of a test (for example if only a few of the avail- able contacts are being used and these are near the top) the reset button S21 can be pressed. This causes S16 to run rapidly to the stop position (row 21). 7. The Open Test The test for opening contacts proceeds exactly as the short test just described, except that having switch SIS in the open position opens wiper 3 of S16. This opens the short that was applied in the previous test to the nodes normally connected to the contact being tested. The relay therefore indicates the behavior of the circuits when the different contacts are opened. The "Prove" Mode of Operation When switch SIS is set in the "prove" position the machine indicates, by lighting some of the lamps L40 to that certain contacts are necessary in any circuit which realizes the switching function set up on the sixteen three- position switches. This indication is obtained by moving switch S22 through its four possible positions. In the W position the machine tests whether W and/or W contacts are necessary and if so, lights the corresponding lamps etc. - 10 - The method of operation is based on the following result in switching theory (stated for simplicity for the case of four variables). At least one W (make) contact is necess- ary in any realization of a given switching function if there are one or more states of the other relays (X, Y, and Z) such that when the X, Y and Z relays are in such a state, changing the W relay from unoperated to operated changes the function from open to closed. At least one W (break) contact is nec- essary if there exists a state of the X, Y and Z relays such that when they are in this state, operating the W relay changes the circuit from closed to open. These are both obvious, since the only way by which operating the W relay alone could close a previously open circuit is by establishing an operating path through a make contact on the W relay, and similarly for the condition with a break contact. The condition that a W contact is necessary can also be thought of geometrically in the following way. The sixteen states of the four relays can be thought of as the vertices of a four-dimensional cube. This cube consists of two three-dimensional subcubes, the first being the eight states of the X, Y, Z relays with W not operated, and the second, the eight states of the X, Y, Z relays with W opera- ted. If there is any point in the "W unoperated" cube in which the circuit is open (closed) while being closed (open) in the corresponding point of the "W operated" cube, at least one W (W ) contact is necessary. The "Prove" part of the circuit can best be under- stood in terms of this geometrical picture. A two-terminal network with terminals a and b is set up in the machine, corresponding to this cubeo Every vertex of the cube for which the circuit should be closed is connected to terminal a; all vertices for which the circuit should be open are connected to terminal b ("don't care" vertices are left floating). When testing for the necessity of W or W contacts, eight diodes are connected between corresponding points of the three- dimensional subcubes mentioned above. These point from the "W unoperated" subcube to the "W operated" subcube. Current will pass from terminal a to terminal b if and only if a W contact is necessary. This is true since this conduction can take place only by entering the cube at a closed state (these being the only ones connected to terminal a), passing through a diode in the conducting direction (this requires that the closed state be in the "W unoperated" cube) and leav- ing the cube to terminal b at an open state. Thus the con- ditions for conduction from a to b are identical with the con- ditions for necessity of a W contact. In a similar manner, it may be seen that the network will conduct from b to a if and only if a W contact is necessary. - 11 - In operation, the circuit is alternately tested for conduction in the two directions. The alternation is obtained by operation of the four-stage buzzer previously described. When P is operated, the circuit is tested for conduction from A to B. If this condition occurs, it fires the corresponding neon lamp (for the w, X, Y or Z make contact). When P is re- leased, voltage is applied to the AB network in the reverse direction and if conduction occurs, it fires the correspond- ing neon lamp (for the WV, X', Y» or Z» break contact). These lamps remain fired until released either by turning off the mam power or flipping the "evaluate-compare" switch S19 from one position to the other. Although it has been explained that the circuit for doing these tests is laid out in the shape of a four-dimensional cube, the circuit diagram of Figure 3 is not drawn by the use of a direct projection of such a cube, but is laid out in a Plane by a method due to W. Keister (The Design of Switching Circuits, D. Van Nostrand, 1951, p. 174), which simplifies its appearance. It can easilv be verified that by putting switch bd2 in any one of its four positions the circuit in Figure 3 reduces to a 4-dimensional cube with 8 diodes joining its two halves. However the manner in which these 4 sets of & diodes each were combined to give a total of only 14, while at the same time using only 8 decks of the switch S22, may be of in- terest. It can be applied to give similar economies in the design of analogous circuits for cubes of any dimension. This method depends on some concepts due to R. W. Hamming (Bell System Technical Journal, 2£, pp. 147-160, April, 1950). It is possible to divide the vertices of an n-cube into two mu- tually exclusive and collectively exhaustive classes, called parity classes, depending on whether the number of coordinates having the value 1 is even or odd. If a point belongs to one parity class, all of the points which have distance 1 from it (and hence differ in only one coordinate from it) are in the opposite parity class. .This means that every edge of the cube connects vertices of opposite parity classes. Since in every position of S22 the diodes are connected along edges of the cube, it means that it is necessary to be able to connect diodes only between points of opposite parity classes. Thus the diodes are all connected to the points of one parity class, and the decks of switch S22 are connected to the points of the other class. If one diode pointing toward and one pointing away from each point of the even parity class is provided, then the switch contacts can connect each point of the other parity class to the other end of the proper one of these two diodes. In the actual circuit not quite this many diodes are used, since the points 0000 and 1111 require only one of the two diodes. - 12 - 9. Notes and Comments The small size and portability of this machine depend on the fact that a mixture of relay and electronic circuit ele- ments were used. The gas diodes are particularly suited for use where a small memory element having an associated visual display is required, and the relays and selector switches are particu- larly suited for use where the ability to sequence and inter- connect using only a small weight and space is required. In all, the relay circuit analyzer uses only 24 relays, 2 selector switches, 48 miniature gas diodes, and 14 germanium diodes as its logical elements. It may be of interest to those familiar with gen- eral purpose digital computers to compare this method of solu- tion of this problem on such a small, special-purpose machine with the more conventional method of coding it for solution on a high-speed general-purpose computer. One basic way in which the two methods differ is in the directness with which the cir- cuits being analyzed are represented. On a general-purpose computer it would be necessary to have a symbolic description of the circuit, probably in the form of a numerical code des- cribing the interconnections of the circuit diagram, and repre- senting the types of contacts that occur in the various parts of the circuit by means of a list of numbers in successive memory locations of the computer. On the other hand, the relay circuit analyzer represents the circuit in a more direct and natural manner, by actually having a copy of it plugged up on the front panel. This difference in the directness of representation has two effects. First, it would be somewhat harder to use the general-purpose computer, because the steps of translating the circuit diagram into the coded description and of typing it onto the input medium of the computer would be more compli- cated and lengthy than the step of plugging up a circuit dir- ectly. The second effect is in the relative number of logical operations (and hence, indirectly, the time) required by the two kinds of machines. To carry out the fundamental step in this procedure of determining whether the given circuit (or some modification of it obtained by opening or shorting a contact) is open or closed for some particular state of the relays requires only a single relay operate time for the re- lay circuit analyzer. However, the carrying out of this fun- damental step on a general-purpose digital computer would re- quire going through several kinds of subroutines many times. There would be several ways of coding the problem, but in a typical one of them the computer would first go through a subroutine to determine whether a given contact were open or closed, repeating this once for each contact in the circuit, - 13 - and then would go through another subroutine once for each node of the network. Altogether this would probably involve the execution of several hundred orders on the computer, al- though by sufficiently ingenious coding this might be cut down to perhaps 100. Since each order of a computer takes perhaps 100 times the duration of a single logical operation (i.e., a pulse time, if the computer is clock-driven), it turns out that what takes 1 operation time on one machine takes perhaps 10.000 on another. Since 10,000 is approximately the ratio between the speed of a relay and of a vacuum tube in performing logical operations, this gain of about 10,000 from the directness of the representation permits this relay machine to be as fast as a general-purpose electronic computer. This great disparity between the speeds of a general- purpose and of a special-purpose computer is not typical of all kinds of problems, since a typical problem in numerical analysis might only permit of a speed-up by a factor of 10 on a special-purpose machine (since multiplications and div- isions required in the problem use up perhaps a tenth of the time of the problem) . However, it seems to be typical of combinatorial problems that a tremendous gain in speed is possible by the use of special rather than general-purpose digital computers. This means that the general -purpose mach- ines are not really general in purpose, but are specialized in such a direction as to favor problems in analysis. It is certainly true that the so-called general purpose machines are logically capable of solving such combinatorial problems, but their efficiency in such use is definitely very low. The problems involved in the design of a general -purpose machine suitable for a wide variety of combinatorial problems seem to be quite difficult, although certainly of great theoretical intere st • 10. Conclusion An interesting feature of the relay circuit analy- zer is its ability to deal directly with logical circuits in terms of 3-valued logic. There would be considerable interest in techniques permitting easy manipulation on paper with such a logic, because of its direct application to the design of economical switching circuits. Even though such techniques have not yet been developed, machines such as this can be of value in connection with 3-valued problems. - 14 - Whether or not this particular kind of machine ever proves to be useful in the design of practical relay circuits, the possibility of making machines which can assist in logical design procedures promises to be of value to everyone associated with the design of switching circuits. Just as the slide rule and present-day types of digital com- puters can help perform part of the routine work associated with the design of linear electrical networks, machines such as this may someday lighten much of the routine work assoc- iated with the design of logical circuits. Attached : Photograph No. 196492 Figures 1, 2 and 3 C. E. SHANNON E. F. MOORE FIGURE I THE RELAY CIRCUIT ANALYZER WAS ABLE TO SIMPLIFY THIS CIRCUIT, REMOVING ONE CONTACT, IN LESS THAN TWO MINUTES TOTAL TIME. CAN YOU DO AS WELL? E — — W\ 1 a I + ?*v --£■' r SELECTOR SWITCH Sit. POSITION NUMBER I 3 < 5 17 ia 4^H>}— O— >M/<--" — i>tW-.i pu pj p pll pj pD pU -X^- pn pu pfl p pj] pH pj pU 4^4- pu pu -r 2 ^ pD pill pll pj] PROVED " / ZCPE-t P y' -X— H . — i — w — J— § ] — x — 1 r $ei£tTor> \SWITCH 317 STATE \POSlTtON MOICATEO NUMBER I oooo ^W^D /? 13 14 IS 16 17 ia is 20 Oil I OIOI OI00 1100 II 01 lilt ton tool 1000 L_ xse/ pmvr 3,8. 0»»<- Mhort open o— 59 0~ -o — K3 relay coil front contact bach contact selector switch Legend _J 5 w 1 — * 4 — <T OCOMPAHE a ^ t2*V +39 V S20 -x- fXOVE EVALUATE 3HO&T OPEN mill/ ■■ HIU) WITHIN r*ICTH>N«L 6S FIG. 2 /l*»//V CIRCUIT DIAGRAM OF RELAY CIRCUIT ANALYZER Bvu- TmrHOMf LAVOMATOfttV*. I»4C B-349291 P- *; <i'c <M°CC N3cJO O J.&OHS ( 3-LbTllMS JAOt/c/ CO in •-qo- — AV- * 5 nO00'Zl\ ^ DIMENSIONS UP TO AND INCLUDING 72 INCHES tXPftCHID IN INCHES. NON-LIMITED DIMENSIONS. OTHER THAN SIZE Of HAW MATERIAL. SHALL »« HELP WITHIN FRACTIONAL DECIMAL 3S issue/ j • ic -S3 mi FIG. 3 pRovr circuit or R E7 LAV CIRCUIT ANALYZ EH WESTERN ELECTRIC CO. INC moiNin of M«»ur«ciu»< CENTRAL OMCl IOUIPMIHT BELL TCLEPHONE Laboratories. Inc B- 349292 p_ /HOLaxL PIIINTID IN U ■. A. m S Z o 73 > o m 5" 2 o O TO o I TrElOaAfl - GIECUIT OPKSATlOTi The central part of the Throbac circuit is a relay accum- ulator which can count up to eighty in a modified Roman numeral system* The accumulator is arranged so that it io possible to add or subtract I, V, X or L to the contents of the accumulator. It consists of seven stages of U-2 circuits. The first three stages Wl-Zl, £2-22 and i'<4-Z4 accumulate ,f I*s n . These stages are arranged to count up to four arid recycle to aero at the fifth I. Thus, within these stages either sero, one, two, three or four "1*8" will be registered. The number of n I*s H appears in binary' form in the three stages of »-Z. The next \h-Z coribination accumulates "V's", either aero or one V being registered here* The final three stages VX^-Zi^, WIg-Zig and U*^-ZX^ accumulate IT s: , s n from aero up to seven. If the relay F is operated, the accumulator is arranged to add; if F is released, to subtract. Supposing F operated, closing Pj adds I to the contents of the accumulator. Closing P v adds V, P 1 adds X and P L add* U This may be verified by trac- ing out the circuit paths into the w-2 circuits in the various cases. For example, if the accumulator has aero in it, all W»s and 2*8 are released, and when Pj is closed a ground passes through a chain of contacts Pj-F-Z^-F to pulse the WX-Z1 pair, and this Is the only W-Z pair to receive a ground. If, instead, P L had been pulsed, the fcfcj-ZJ^ pair and the SS^-ZX^ pair would both receive ground, thus registering L (Sill ♦ X), A study of the circuit • 2 *» vd.ll show that In all cases it adds or subtracts (according to the state of F) I, V, X or L when Pj, P^ g P x or P^ is operated. At the bottom of this circuit a connection leads out to control the C relay* This connection will be seen to carry & {.-round when a number is added to the accumulator vrhich causes it to overrun its limit either by addition, giving a number greater than seventy-nine, or, by subtraction, a number less than zero. In these cases the carrying to or borrowing from f&utt would be the next column goes out on the lead in question to control the relay. This relay, to be described later, indicates the end of a division. The number registered in the accumulator is displayed on the panel by means of a series of thirteen lights. These lights are controlled by contact networks on the W-Z relays of the accumulator. The contact networks translate from the modified Roman numeral notation to the standard one. The part of the number which is a multiple of ten appears in the three left columns of lights* 1*7 or X7, or X 6 , L $ or X $ . The part of the number registered which is less than ten appears in the four right columns of lights. As an example, suppose the number registered is LXXV (64) • In the accumulator the W-Z pairs W4-Z4 (HID, UX^-ZX^ and WX2-ZX2 (XXXXXX) will be operated and other W-Z pairs released. In the accumulator light circuit it will be found that lights L©, I4 and will receive a ground and be illuminated, dis- playing the number IXIV, The sequencing for adding or subtracting a number entered in the keyboard into the accumulator is carried out chiefly by stepping switch A, For such an addition or subtraction, this stepper sweeps across the keyboard, starting from the right-hand column and sequentially adding or subtracting the numbers registered ftn each column. The addition sequence is started by pressing the ADD button which causes P to operate and lock in through a back contact on £• The operation of P causes the bus a or relay 8 to start operating and releasing at about ten cycles per second. Whan 3 closes it pulses the stepping coil of stepper A, novin^ it ahead one notch. The release of D puts a ground on the wipers of the stepper and, therefore, on the first vertical connection through the key- board switches* Let us suppose that the number -IX VI is entered in the keyboard In the four right-hand columns* I is then registered in the right most colum and the ground from the stepper passes through this I push button to operate the Pj relay* The F relay has been operated by P and therefore X is added to the previous contents of the accumulator* On the next cycle of the busser. the stepper moves to the next column and operates the Py relay which adds V into the accumulator* Py also causes E to operate and lock in through t% The purpose of this is to cause any further I*s to be subtracted rather than added. On the next cycle of the buzser, ground is applied to the third vertical of the keyboard and, because of the t entered there, operates the P L relay* This adds L to the accumulator and also operates the S relay, which also locks in through The operation of 5 signif ies that an L has occurred and consequently any X'b or V«s now encountered on the keyboard oust be subtracted. On the next cycle of the buzzer, the fourth vertical receives ground and because of the X in this column, p i operates. Since S is closed, the relay H also operates, releas- ing F and isaking the accumulator subtract instead of add. The tiding of these relays is adjusted so that F releases before the p£ pulse could add into the accumulator. X is therefore subtracted. On the next three cycles of the buzzer, no further numbers are en- countered and the accumulator does not change. On the eii^ta cycle, the wipers pass a ground to the K relay which locks in axsmentarily, and also to the reset coil of the stepper. The opera- tion of K releases relays P, & and s and also disconnects the buzzer and the wipers. The reset coil allows the wipers to return to their normal position and since they have been disconnected by K they have no effect as they pass over the keyboard colunns. When the wipers reach their nornal position they open the off-normal switch of the stepper. This releases K and the addition operation is complete. The process of subtraction is essentially the sane. Pressing the subtract button causes M to operate and lock up, which starts the buzzer and the stepping operations. In this case, however, F Is normally released, so that numbers encountered in the keyboard are normally subtracted* However, when a smaller number is encountered after a larger one the relay F will operate, causing It to be added. Sfciltiplication Is obtained by successive addition. If the m button is pressed, the machine adds the contents of the key* board into the accumulator V tines, if the M button is pressed X tines. This counting is controlled by stepper B. If the m button is pressed, the keyboard contents ere added or subtracted depending on whether the Wt or buttons have been previously operated* Suppose VIII is to be multiplied by IV. VIII is entered in the keyboard and first the MV and then the 11. push buttons pressed. When the m button is pressed, relay ffl operates and locks in through Qt. The relay T also operates, locking in through the Clear Upper key. The relay T signifies that I's occurring later in the multiplier must be interpreted as negative. The operation of KV causes the P relay to operate and start an addition operation* When stepper A reaches the eighth point, K operates causing the step* ping coil of stepper B to receive a ground {through the MV make) . fcfoen stepper A resets to normal, P again operates, again adding the keyboard contents into the accumulator and advancing stepper B at the end of the addition* This process continues until stepper B reaches Its fifth point* There the ground on the wipers operates relay Q which releases MV and stops the series of additions. Q locks in and applies ground to the reset coil of stepper B, return, ing it to normal* When it reaches normal, the off -normal contacts are opened and Q is released. Next the ia button is pressed* Since T is in (due to the previous operation of 117) , this causes H to operate and the machine subtracts the keyboard contents from the accumulator. This c ample tee the multiplication. The ML button produces a sequence s im i l a r to the MV button , except that stepper £ crust go to the tenth point instead of the fifth to operate Q and stop the series of additions. If another multiplication is to be performed, the Clear Upper button should be pressed. This releases T and resets stepper B to normal if for some reason it is not already there. Division is performed by successive subtraction. The dividend is entered in the accumulator and the divisor in the key- board. When the divide button is pressed, relay E operates and locks in through P* or K*. C is normally out and E, therefore, causes M to operate and lock in, starting a subtraction. If, during this subtraction, the accural la tor does not run through aero, C will not operate and another subtraction will occur since U will again operate as soon as £ releases. At each subtraction of this sort the operation of & at the end of the subtraction energises the stepping coil of stepper B advancing it one step. Eventually in this subtraction process the contents of the accumulator will go negative. This causes C to operate and indicates that one too many subtractions have been performed. The last subtraction is not counted on stepper B since its operating path passes through C«. The operation of C causes the next operation to be an addition, since the next ground when S releases is placed on P rather than M. The machine therefore goes through one addition sequence (compensating in the accumulator for the extra subtraction)* At the eighth point of this sequence K operates and, since P is operated, the hold on £ opens and E releases. This stops any further additions or sub- tractions and also releases the C relay for the next division. The stepper B will be at a level equal to the number of subtractions {not counting the extra one) and Its position therefore is the quotient desired. The value of this quotient is indicated on the quotient lights which are wired to the contacts of the stepper in such a way as to indicate in Soman numerals the position of the wipers. This dial is cleared by pressing the Clear Upper button whltaj operates the reset coil of stepper B. c. e. suaekoh April 9* 1953 |>3 TOWER OF HANOI C. E. Shannon The Tower of Hanoi machine automatically solves a well-known puzzle constructed as follows. There are three pegs standing upright in a horizontal plate. On the first peg are a number of disks of graduated sizes. The problem is to move all these disks to the third peg subject to the rules that (1) only one disk can be moved at a time, and (2) a disk can never be placed on top of a smaller disk. This puzzle has been treated in the literature. It can be readily proved by induction that with n disks, 2"-l moves are necessary. For suppose this formula is true up to n-\. With n disks, in order to move the largest one to the third peg it is necessary that all the other disks be on the second peg in proper order. This, by assumption, requires 2 n_1 -l moves. Moving the largest disk requires one more and moving the n-l disks from the second to the third peg, again by the inductive hypothesis, requires 2 n_1 -l moves. Consequently the entire operation requires 2"-l moves. Since the formula is true for n = 1, it holds in general. The argument also shows how to build up a solution for any n from n-l, and hence, eventually, from the n = 1 case. For n = 6 (the case handled by the machine) the solution is given by the following table. 000000 000000 100000 211111 000001 000001 100001 211112 000010 000021 100010 211102 000011 000022 100011 211100 000100 000122 100100 211200 000101 000120 100101 211201 000110 000110 100110 211221 000111 000111 100111 211222 001000 002111 101000 210222 001001 002112 101001 210220 001010 002102 101010 210210 001011 002100 101011 210211 001100 002200 101100 210011 001101 002201 101101 210012 001110 002221 101110 210002 001111 002222 101111 210000 010000 012222 110000 220000 010001 012220 110001 220001 010010 012210 110010 220021 010011 012211 110011 220022 oioioo 012011 110100 220122 oioioi 012012 110101 220120 oiono 012002 110110 220110 OlOlll 012000 110111 220111 ni 1 AAA 011000 /"\ 4 H AAA 011000 111000 222111 011001 011001 111001 222112 011010 011021 111010 222102 011011 011022 111011 222100 011100 011122 111100 222200 011101 011120 111101 222201 011110 011110 111110 222221 011111 011111 111111 222222 The first column gives the binary numbers from to 63. The second column describes the positions of the disks. For example, 000000 means that all disks are on peg 0. The fifth entry 000122 means that the three largest disks are on peg 0, the next smaller disk on peg 1, and the two smallest disks on peg 2. The numbers in the second column are related in a peculiar manner to the binary numbers in the first column and can be calculated from them. The process can best be described by an example. Take, for instance, the binary number 010110. The following calculation i - 3 - performed. +-+-+- 10 110 2 2 1 2 2 1 2 2 The columns here alternate + and -. The second row 022122 is obtained by summing the first row horizontally mod 3 with + or - sign depending on the column. Thus 0=0, 2=0-1, 2=0-140, 1=0-1+0-1, 2=0-140-1+1 and 2=0-1+0-1+1-0 (all mod 3). The third row is obtained from the second by alternately adding and subtracting the first row from it. This row is the corresponding position of the disks in the solution of the puzzle. It can be shown that this relation holds in general. The Tower of Hanoi relay circuit is based on this curious relation. The machine basically consists of a binary counter (six stages of W-Z counters) which counts from to 63. Contacts on these relays are connected in a network which controls a set of eighteen lights. There are three lights for each of the six disks, one on each of the three pegs. At a given time, one of these three will be on, indicating the position of the corresponding disk. As the counter proceeds through its count, the lights are switched to indicate the process of the solution. The circuit of the machine is shown in Fig. 1. The right hand network controls the lights. It will be seen that this consists of a symmetric function lattice in which the stages alternately add and subtract mod 3. The ground coming in at the bottom of this circuit will appear in columns 0', 1', 2' according to the first number computed in the above calculation (i.e. 0'2'2'1'2'2' in the example given). The further calculation (012002 in the example) is carried out by the single stage mod 3 circuits attached to the basic mod 3 lattice. It is interesting in this circuit that when one of the larger disks is moved the lamps corresponding to smaller disks receive their operating current through a path which is switched. The counting process, however, is so rapid that they appear to be continuously illuminated. The control circuit at the left of the figure contains a three-position key switch. In the center position, the machine stops. In the top position, it causes the buzzer B to operate the counter and therefore proceed through the solution at about two steps per second. When the count reaches sixty-three, the buzzer stops. If the key switch is depressed to the lower position (non-locking), the counter is advanced one count. By moving the switch between the center and the lower positions the solution can be observed step by step. Matbmanship or How to Give an Explicit Solution Without Actually Solving the Problem After reading several weighty papers giving formulas which assume only prime values, I felt moved to develop a few further results of the same type. Theorem 1* There exists a unique real positive number X < 1 such that e^ - £2° X] - 2[2 n - 1 XI !0 if n is composite 1 if n is prime Here Lx] means, as usual, the largest integer in x. The value of X Is ,413 •••• Theorem 2. There exists a unique real positive number \i < f such that the n*" prime is given by - IS?* 1 u] - 2 2 * 1 L2^ u] Hots the i mp r o v e ment over previous results - this formula gives all the primes, not Just some of them* For analysts who find the bracket symbol a little suspect, we have the following: Theorem 3* There exists a real number h such that sin 2 n q is positive or negative according as n Is prime or com* posits. » 2 a Theorem 4* There exists a real number & such that - tan 2^ 5| <^ Proofs are left as an exercise for the reader. C. E. SHANNON 6/3/53 Bell Telephone Laboratories / ^ \ incorporated " Cover Sheet for technical memorandum subject: The Relay Circuit Synthesizer - Case 20878 COPIES TO: case file ( HWB-WOB-JBF) ( BDH) DATE FILE AREA CENTRAL FILES (4) MM _ 53-140-52 53-180-52 DATE November 30, 1953 author C. E. Shannon E. F. Moore 1 - M. L. Almquist 2 H. W. Bode 3 R. Bown 4 E. Bruce 5 A. J. Busch FILING SUBJECT (TO BE ASSIGNED BY AUTHOR) 6 A. B. Clark 7 W. H. Doherty 8 mm E. B. Ferrell Switching Theory 9 J. B. Fisk 10 H. T. Friis 18- C. A. Lovell 11 T. C. Fry 19 - M. B. McDavitt 12 G. W. Gilman 20 - J. Meszar 13 D. W. Hagelbarger rr D. Holbrook >v v 21- R. K. Potter 14 B. 22 - F. J. Singer 15 A. C. Keller v />s.23-S. H. Washburn 16 F. A. Korn V^f^L- I. G. Wilson 17 ¥. D. Lewis ABSTRACl The Relay Circuit Synthesizer is a machine to aid in switching circuit design. It is capable of designing two terminal circuits involving up to four relays in a few minutes. The solutions are usually minimal. The machine, its operation, characteristics and circuits are described. The Relay Circuit Synthesizer - Case 20878 MM- 53 -140-52 MM- 53-180- 52 November 30, 1953 MEMORANDUM FOR FILE Purpose and Operation The Relay Circuit Synthesizer (Photograph 214142) is a machine to aid in the design of a certain class of relay circuits. The type of circuits it handles are two-terminal switching circuits involving up to four relays or (by simple alterations) other two-valued elements. The desired charac- teristics of the circuit to be designed are entered in a set of sixteen three-position switches on the front panel of the machine. After a period of computation, averaging about five minutes, the machine stops and displays a circuit satisfying the requirements. The circuit is displayed in geometric form on a card in an associated card display mechanism (Photograph 214140). The labels of the contacts on this card must, however, be interpreted in accordance with indicating lights on the front panel of the machine to obtain the proper answer to the design problem. In about eighty per cent of the possible problems that can be set up on the machine, the solution it gives will be minimal in contacts, i.e., the number of contacts in the circuit cannot be reduced. In the remaining twenty per cent, the designs cannot be simplified by more than one contact and may, in fact, be minimal. The sixteen input switches correspond to the six- teen possible states of the four relays in the circuit being designed. Each of these switches has three positions labeled "open," "don T t care" and "closed". If, for a given state of these relays, it is desired that the circuit be open, the corresponding switch is set in the "open" position. Similarly for the "closed" position. If it does not matter whether the circuit be open or closed in this state, the switch is set at "don't care"# The Synthesizer takes advantage of any switches in the "don't care" position in attempting to reduce the number of contacts used in the final circuit. It fills in these unspecified states in such a way as to minimize contact requirements. This ability to handle partially specified switching problems is one of the main features of the Synthesi- zer and enables it to solve problems for which analytic methods are at present ill-adapted. - 2 = In addition to the direct circuit designing pro- cedure outlined above, the Synthesizer is equipped with controls for other modes of operation. It may be run at low speed for demonstration purposes, it may be set up to find all the circuits in its card file satisfying the re- quirements (not just the one with the smallest number of contacts) and it may be used to determine various mathematical properties associated with switching functions* By changing the paper tape and the card file used (but without any internal change within the electrical part of the machine) it can be made to solve design problems in- volving diode circuits instead of relay contact circuits. By a still different tape and set of cards it can minimize the number of transfers in r elay circuits instead of the number of contacts. With suitable tape and card file, it can solve a variety of other similar problems. The Synthesizer represents a first step toward machine design of switching circuits. Unfortunately, although the method used in the Synthesizer may be generalized in prin- ciple to circuits involving five or more variables, the time for solution increases at an alarming rate. With five vari- ables it would take many thousand times as long to obtain a solution. The card file and the tape would' be about two thou- sand times their present size and would require many man years to construct. Consequently, a direct generalization of the Synthesizer is hardly indicated, even with the high speeds available in electronic computing gear. Speed of Solution With Random Problems An idea of the time required for the Synthesizer to solve problems may be obtained from some tests with random settings of the input switches. Using a book of random num- bers, ten sets of sixteen random binary digits were obtained. These were set up as input switch settings using to mean closed and 1 open, and the time required for the machine to solve each of these problems was measured. The following table gives the results of this test. 3 Binary Digits Solution Trans- ( Switch Settings) Circuit No. formation 11 #279 w* w x 1 z 111 y y 110 z* x 10 1 #177 w» x 10 x y 10 y z 1111 z w 10 10 #306 w z 1 x* y 10 1 y» w 1 z» x 1 #261 w z 1 x» w 10 1 y y 1110 z» x 10 10 #212 w x 111 x* w 10 1 y 1 y 10 z z Ho. of Time of Contacts Solution 8 4min-10sec c 6 lmin-10sec« 10 7min-20sec. 10 7min-7sec. 11 9min-6sec . Binary Digits (Switch Settings 10 1 10 11 1110 10 10 11 1110 1110 11 11 10 11 10 10 10 10 10 1110 10 111 10 11 10 11 - 4 - Solution Trans- Circuit No. formation #137 w w x» x y y z z #75 w x X z y T y z w #240 w« y X I w y f x z z #193 w z x» y y w z x # 34 w x x» z y w z y No. of Time of Contacts Solution 9 6min-32sec. 9 6min-10sec. 5 3Ssec. # 4min-30sec. 9 5min-50sec. - 5 - The Solution Circuit Number refers to the Table in MM-52-180-45, E. F. Moore, n A Table of Four Relay Two Ter- minal Contact Networks". The Transformation indicates the required change of variables in interpreting the numbered circuit of this Table. The average solution time for these ten completely specified random functions was 5 min.-15 sec, and the average number of contacts in the solution was 8.5. A second test was run with partially specified random functions. Again using the Table of Random Numbers, four switches were chosen at random for "don T t care" settings; the remaining switches being given random "open" or "closed" settings. This was done four times, leading to the following results: Binary Digits (Switch Settings) Solution Trans- No. of Time of D "Don't Care Circuit No. formation Contacts Solution D 1 1 #334 ww 6 3min-5sec. D xx D 1 D y y z z D 1 D #189 w* w 7 6min-30sec. D 1 1 x z 10 y y D 1 z x 1 D #178 w y 8 7min-25sec. D 1 1 x' w D 1 y z D 1 1 z» x 001 D #58 wy 3 12sec. D D 1 x w D 1 1 y» z 10 11 z» x - 6 - The average time of solution for these problems with four un- specified states was 4 min.-20 sec, with an average of 6 contacts. Finally, a test was run with random problems having eight unspecified ("don't care") states. These results were as follows: Binary Digits (Switch Settings) D=Don l t Care D 1 D D D D D 1 D D Solution Circuit No, #204 Trans- No. of formation Contacts w X y z w z y X Time of Solution 55sec. D D 1 1 D 1 D 10 10 D D D D #179 w y x x y' z Z 1 Z 6 2min-55sec, D D D 1 1 D D 1 D D D # 5* w y x x y w z z 40sec, D D D D 1 1 D D D 1 1 D 1 1 # 79 w* y x' z y x z w 3min-15sec, The average solution time here was 1 min.-Sft sec, and the average number of contacts 4»5, - 7 - The following table summarizes these average figures: Completely Unspecified Unspecified specified in 4 states in g states average time 5min-15sec 4min-20sec lmin-56sec average number £.5 6 4.5 of contacts With still more "don't care" states the solution time and average number of contacts would undoubtedly decrease still further. General Theory of Operation The Relay Synthesizer deals with Boolean functions of four variables. Each of the variables has two possible values, to Ij in conjunction there are 24 = 16 sets of values or "states" of the variables. For each of these states, a function of these variables can be either to 1. Thus there are 2 16 = 65,536 different Boolean functions of four variables. It is known that these 65,536 functions can be subdivided into 402 classes or "types" of functions. Two functions are said to be of the same type if one may be obtained from the other by negating some of the variables or permuting some of the vari- ables or both. Thus the function w + x»(y+z) is of the same type as x» + z(w+y*) or w T + yfx'+z*). All functions of a given type present substantially the same design problem. If a good circuit is found for one of them, it applies equally to all other functions of the same type, for it is necessary only to relabel contacts properly and it will represent these other functions. - $ - In the memorandum referred to above, circuits are foT en *. f0r these 402 types of fun ctions. At present writing, 331 of these have been proved to be minimal in contacts; the remaining 71 are known to be within one contact of being minimal. This catalog of circuits is a key part of the design procedure in the Relay Synthesizer. S The reader may wonder why the Synthesizer is ne- cessary for designing circuits when such a catalog is available. Why not merely find the circuit corresponding to the desired function in the catalog? The answer is that it is not at all easy to find the type or class to which a given function be- longs even when the function is completely specified. If the desired function is not completely specified : (has one or more don't -care" states) there will in general be many types of functions consistent with the requirements, and it becomes extremely difficult to locate these in the catalog. The* Syn- thesizer is, in fact, a machine for determining the type* of a fully specified function and (in the partially specified case) the possible type having the least number of contacts in its catalog circuit, A block diagram of the Synthesizer is shown in Figure 1, and indicates the main functional organization. The specifications of the desired circuit are set up on the input switches in the right-hand box. The catalog of the 402 types of functions appears on a paper tape in the left-hand Tape Input box. Each function occupies six lines of tape. The first four lines give the states for which the function is closed. The fifth line gives the number (in binary form) of closed states for the function, and the sixth line contains a special hole marking the end of data relating to this function, i.e., it acts as a punctuation mark separating functions on the tape. In solving a particular problem, the tape functions are studied one by one in the machine. All permutations and negations of a particular tape function are compared with the desired specifications as set up on the input switches, when an exact match is found the machine stops, and the tape func- tion together with the permutation being applied to it re- present a solution to the problem. In the block diagram this is carried out as follows: The tape function is stored in the memory relays. A permuting - negating network applies the equivalent of the various possible permutation and negation operations to these data. The results of each permutation-negation operation are compared with the input switches in a comparison circuit to see if a match has occurred. If not 5 an error signal is fed back to the permu- tation sequencer, causing it to advance to the next permutation operation which is, in turn, compared, etc., until all of the 3#4 possible permutations and negations have been tested. Be- cause of short-cut circuits to be described later, the machine frequently skips many of these, reducing the solution time considerably. When the set of operations on a particular function is exhausted, the permutation sequencer sends a signal back to the tape driving circuit, and the next function is read into the memory for test. This signal also causes the card display device to drop another card from its stack. The card displayed always corresponds to the function being tested in the machine and shows the most efficient knovn circuit for that function. The permutation indicator is controlled by the per- mutation sequencer and indicates in lights the permutation currently being tested. When the machine stops at a solution, these lights show what permutation and negation must be ap- plied to the circuit on the card to solve the problem at hand. In the problems involving "don^ cares," the Syn- thesizer could be used to successively find all of the solution, but to use all this information in designing a circuit, it would be necessary to compare all the circuits obtained, and see which one is preferred. Since the grounds for preferring one circuit over another has been taken to be economy of contacts, the ne- cessity for this comparison step has been eliminated by arrang- ing the functions on the tape in order of increasing number of contacts, so that the first solution arrived at will automatic- ally be the preferred one. Arranging the functions on the tape in terms of any other criterion will cause the Synthesizer to design circuits based on this criterion. If, for instance, it is desired to design relay circuits using as few springs as possible, or to design diode logic circuits using as few diodes as possible, it is only necessary to arrange the functions on the tape in order of number of springs or number of diodes, respectively. Circuit Operation Figure 2 is the circuit diagram of the Synthesizer. The layout of subcircuits corresponds roughly to the block diagram Figure 1. We will first describe the circuit operation in the logically 3'implest mode of operation — the normal mode with all short-cut circuits eliminated. In Figure 2, then, 5 we - 10 = assume the mode of operation switch in the "Normal" position N, the relay Q operated (eliminates permutation short cuts) and the number of state switches M are set at "Normal",, Since the Synthesizer is essentially a closed loop system, it is difficult to find a point at which to start a description of its operation. It is perhaps simplest to as- sume that the machine has just finished testing one function on the tape. The relay H may then be assumed to have just operated locking in to the make on R . since the tape reader will be at the division line between functions and consequently R s operated,, Operation of H releases the hold on the memory relays (M^M-^ „ „ „ .M^) and also the hold on the steering counter relays (W^Z^W^Z^W^Z^) , thus resetting this counter to zero. It also applies voltage to the teletype magnet which, a moment later, will pull free of the tape and hence release R . This releases H and reconnects the holds of the steering counter and the memory relays. It also es- tablishes a path to the slow relay SO through its own back contact SO*. SO now acts like a slow buzzer, producing pulses at a rate of about six per second and relay U follows these pulses through the SO make contact. The pulses produced by U operate the teletype magnet, advancing it line by line until it reaches the line with an R hole, at which point the back contacts on R g open both the s buzzer circuit to SO and the teletype magnet circuit through U. The pulses produced by U are also fed into the three-stage binary counter consisting of three WZ pulse dividers WjqZ^, W M2 Z M2'» ^oho* Tnis countei *> therefore, keeps track of the line of tape, counting from the last division between two tape, functions iR s hole). This counter controls the steering trees leading into the memory relays Mq,]^, . . . ,1^ and the number of state relays V l5 V 2 ,V^,Vg The first line of tape after the R g line is fed into M^M^M^M^, the second line into M^,M 5 ,M 6 ,M 7 „ the third into M^M^M^M^, the fourth into M 12t M 13 » M 14 » M 1 c J and the fifth into ^^"^Vg. A section of the tape is shown in Figure 3. The completion of this tape reading operation, in- dicated by closure of R g , puts ground on lead 106 leading into the permutation-negation network. - 11 - Permuting and Negating Circuits These circuits enable the machine to apply the 3#4 negation and permutation operations to the tape function stored in the memory to compare it with the desired function set on the input switches. The negation-permutation sequencer consists of nine WZ pairs connected in a form of counting circuit which can go through 3#4 different states. Starting from the- iigh .speed k (pulsed) end of this circuit, the first (6ix/WZ pairs, E, D, B, C and A, relate to permutations and can go through twenty-four states corresponding to the 41 = 24 permutations of the four variables. The other four stages w, x, y, z relate to negating the variables and can go through sixteen states corresponding to the sixteen ways of negating four variables. In combination this gives 3#4 states. In the circuit, imagine Q operated, F Q and F T £ re- leased and thatFo is pulsed, so that a series 6f pulsus is applied to line 109. The negation-permutation ^sequencer will then proceed through the 3^4 negation-permutation operations. This sequence is shown in the accompanying Table I for the first twenty-four of these, i.e., a full set of permutations. At the twenty-fourth step this sequence repeats for the permu- tation relays but a pulse is applied at lead 250, advancing the negating relays one step. The negating relays go through the sequence shown in Table II, advancing one step after the fermuting relays have gone through a full set of permutations, n this manner the full set of 16 x 24 combinations is ex- hausted. - 12 - Table I Sequence of Permutations Relays W A W B W C W D W E (1 means operated) Relays A B C D £ Permutation W X Y Z Becomes ____ o 1 1 1 1 1 W X Y Z 1 1 1 1 1 1 W Y Z X 2 1 1 1 1 1 w Z Y X 3 1 1 1 1 1 w Y X Z 4 1 1 1 1 1 w Z X Y 5 1 1 1 1 1 w X Z Y 6 1 1 1 1 1 Y X W Z 7 1 1 1 1 1 z Y W X a 1 1 1 1 1 Y Z W X 9 o 1 1 1 1 1 X Y ¥ Z 10 1 1 1 1 1 X Z ¥ Y 11 1 1 1 1 1 z X W Y 12 1 1 0, 51 1 1 X Y Z W 13 1 1 1 1 1 Y Z X w 14 1 1 1 1 1 Z Y X ¥ 15 1 1 1 1 1 U Y X L W 1 1 1 1 T X Z X Y ¥ 17 1 1 1 1 1 X Z Y ¥ Id 1 1 1 1 1 X ¥ Z Y 19 1 1 1 1 1 Y W X Z 20 1 1 1 1 1 Z W X Y 21 1 1 1 1 1 Y W Z X 22 1 1 1 1 1 Z w Y X 23 1 1 1 1 1 X ¥ Y Z - 13 - Table II Sequence of Negations Relays Relays Variables W w W v Vf Vtf W I T Z W X Y Z w x 7 z ' Become 1 1 1 1 w X Y Z 1 1 1 1 w X Y Z' 1 1 1 1 w X Y» z» 1 1 1 1 w X Y' z 1 1 1 1 w X* Y z 1 1 1 1 w x» Y z» 1 1 1 1 w XT y» z» 1 1 1 1 ¥ x» Y» z 1 1 1 1 x» Y z 1 1 1 1 w» x» Y z» 1 1 1 1 w» X 1 Y» z» 1 1 1 1 x» Y' z 1 1 1 1 X Y z 1 1 1 1 X Y z» 1 1 1 1 X Y» z» 1 1 1 1 X Y» z» At the end of this sequence, a ground is applied to line 135 which initiates reading in a new function. It may also be noted that if relay Q is released and F16 is operated a ground is applied directly to line 250, the input to the negating part of the counter. This will - 14 - cause the counter to skip a set of permutations and advance directly in the negating sequence by one step. Operation of Fig also releases the plus side of the permutation relays in the sequencer, resetting them to zero. The function of F|g is to short-cut some of the calculation in certain cases as will be described later. In a similar way, operation of F& with Q released advances the and Wg parts of the permutation sequence by one step, skipping a subset of six permutations in which Wq, W d and W E take part. F^ releases the plus to these three WZ pairs, resetting them to zero. This also is used for short :out_ purposes. The permuting and negating relays A, B, C, D, E and W, X, Y, Z are operated from back contacts of the correspond- ing W relays in the WZ pairs of the sequencer. Thus they as- sume the complementary states as shown in Tables I and II. The function of these nine sets of relays is to interchange sixteen leads representing the function in the memory relays in accordance with the permutation and negation in the se- quencer. The logical organization of this circuit can be represented in a symbolic form by Figure 4, which indicates the effect of the negating and permuting relays on the variables of the tape function, (not £he effect on the sixteen leads) . Thus, the W relay negates the variable W when released, the X relay negates X, etc. The A relay interchanges W and X and also Y and Z when released, the B relay interchanges the vari- ables now appearing (after the possible A interchange) on the first and third lines, etc. It will be found that the twenty- four combinations of A, B, C, D, and E produced by the sequencer (Table I) lead to the twenty-four permutations of the four variables as shown in Table I« Now the circuit does not work with the four Boolean variables but with sixteen lines representing the sixteen states of the four variables. Negating a variable, say W, corresponds to interchanging the eight lines (or states) for which W is 1 with the corresponding eight lines for which W is zero. Thus in the premuting circuit, the W negation box of Figure 4 becomes eight reversing or interchanging circuits operated by the relays W-^ W 2 , W 3 , W^. A similar statement applies to the negation of the other variables and the per- muting of the variables by the Ai} B , C, D and E relays. - 15 - To summarize, the sequencer can go through 3#4 states representing the 3#4 permutations and negations. The negating-permuting network sets up the corresponding inter- changes of the sixteen lines from the memory to the input switches. At the memory end, these lines are given plus or minus voltage according as the memory function is open or closed. At the input switch end, after the permutation and negation, these voltages are compared with the settings of the input switches,, There are two types of comparison circuits. The first type, Figure 5, applies to switches Q, 7, S and 15. It will be seen that Ffo will operate if the lead from the per- muting network is positive and the switch is set at "closed," or if the lead is negative and the switch is set at "open," i.e., if there is a disagreement between the switch setting and the value coming in from the permuting network. If the switch is set at "don't care," Fk will not operate. It will also be seen that the red and green lights will indicate "closed" and "open" settings of the switch respectively, while if set at "don't care" the red or green light will in- dicate minus or plus coming in from the permuting network. The comparison circuit for the other switches is somewhat different. There are two relays F-^ and F 2 common to all the other switches. If a particular switch is set at "closed," the line from the permuter goes through a diode to F 1 , the other side of F 1 being minus (when the test is made). Thus F 1 will operate if a plus appears on the line from the permuter (disagreeing with the "closed" position of the switch). If the switch is set at "open," the path from the permuter goes through the same diode but in the op- posite direction to F 2 , whose other side is connected to plus. Hence F 2 will operate if a minus comes in from the permuter. The red and green lamps are connected substantially as before. Returning now to the description of the operating sequences in the machine, we recall that the completion of tape reading of a function into the memory was signified by closure of R . This applies ground at lead 106 into a long "equality chain" of contacts. This chain is closed only if all of the W relays in the WZ pairs of the sequencer agree in position with their corresponding Z relays. This being true, ground is applied to the permuting and negating net- work, and, as already described, one or more of the F relays (F Q , F^, Fg, F^, F^, F 2 ) will operate unless the tape func- tion as permuted through the network agrees with the input - 16 - function. Assuming there is a disagreement, one at least of F^, F^, F^£ will operate, grounding the input to the ne- gation-permutation sequencer. This advances the W relays of the sequencer one step in the sequence, and causes a dis- agreement between at least one of the W relays and its jSorresponding Z relay in the WZ pairs. This disagreement, in turn, opens the "equality chain," releasing the F relays which, in turn, removes the ground from the sequencer and allows its Z relays to follow their corresponding ¥ relays. When equality has again been established, ground is again applied through the "equality chain" to the permuting network and the next permutation of the sequence (now set up on the permuting network) is tested in the same way. This cycle of operations continues until the full set of permutations and negations has been tested. After the last permutation, the next ground goes through a Z w contact and the mode of opera- tion switch to operate H, signifying the completion of tests on the current function and initiating reading the tape for the next function as previously described. If, at some point, the permuted tape function matches the input function, no F relay will operate and the cycle is stopped. Relay J will operate and, in turn, L through the chain of back contacts on the F relays. The operation of L rings the gong indicating a solution, and pulses the message register for counting purposes. Short-Cut Operation We now describe the short-cut provisions. If the short-cut eliminator is "off," relay Q will release, rear- ranging the inputs to the sequencer. In the permuting net- work it will be seen that the lines on the zero level and on the 15 level are not switched after the vertical column of Z contacts, i.e., after emerging from the negating part of this circuit. This means that if a disagreement occurs on either of these lines, it will persist throughout all the permutations, which only change the switches A, B, C, D and E in this network. Hence, in case of such a disagreement it is not necessary to test all of these permutations but the machine can proceed immediately to the next negation saving a great deal of time. In the circuit, when Q is released, operation of Fv or F,~ pulses directly into the negating part of the sequencer and resets the permuting part to zero. - 17 - In a similar manner, it will be seen that the lines at the 7 and & level in the permuter are not switched after the B contacts. This means that a disagreement on either of these lines, indicated by operation of Fy or F#, will persist over the subset of six permutations in which C, D and E change* Hence it is unnecessary in such a case to test each of these individually and the machine advances to the next permutation involving a change of A or B. In the sequencer, a ground is applied at the input to the A, B stages and G, D, E stages are reset to zero. This is done by relay Fq which will pperate if either Fy or Fg indicates disagreement. One further short-catting device has been incorpor- ated in the machine. With each tape function is included, in binary form, the number of states for which that function is closed. As previously described, this number is stored in the relays V lf V 2 , V^, Vg, V lo when the function is read off the tape. On the front panel of the machine are two seventeen- point switches labeled Max and Min. The Min switch should be set at a number equal to the number of input switches in the "closed" position. The Max switch should be set at this number plus the number of "don't cares". Now, regardless of how the "don't cares" may be filled in, the number of closed states will be within this range (including the end points). A function from the tape could not possibly be satisfactory unless its number of states lies within this range. The machine is arranged to compare these numbers and, if this con- dition is not satisfied, to skip the function completely and go immediately to the next function on the tape. The comparison is carried out in the "number of states comparison circuit". The contacts on the V relays are arranged in the topological dual of an ordinary tree. This implies that if the number n is registered (in binary form) in the V relays, then all of the vertical leads labeled zero to n at the Min switch will be connected together, but the two groups are not connected. It will be seen, therefore, that if the number on the V switches lies in the range covered by the Max and Min settings, then the Max and Min swingers will not be connected. If the V number is outside this range then the Max and Min swingers will be connected. If the Max and Min swingers are connected, the operation of R closes a path to operate H and start reading in a new function imme- diately. It is necessary to use five relays - V-^, V 2 , V^, V rt , and V-j^-to represent all of the numbers from to 16 in- clusive, but there were only four holes readily available on the tape for reading into these relays. Consequently four of the relays are read into directly through the steering relays, and a special artifice is used to get the fifth digit stored in Since the only case in which this digit equals 1 is when the number of states is 16, and all the other four relays are released, this relay is operated through the back contacts of V lr V 2 , V^, and Vg in series. But since V-^, V^, V^ p and Vg are also all released when the number of states is 0, a contact of Mq is also included in the operate path, to distinguish between these two cases. Without the short-cutting features the average time of solution for a completely specified function would be over an hour; with short cuts it is about five minutes. Indicating Circuits A set of indicating lights is provided which shows the permutation and negation that must be applied to the tape function (when a solution has been found) to transform it into the function on the input switches. The eight negating lights are connected in a simple fashion to the W, X, T and Z coils. If the W relay is out, for example, the W* lamp lights up by a current through the W coil (not sufficient to operate the W relay). If the W relay is operated, the W lamp lights up by current through the W w contact. The circuit for the permuting part is more complex. However, on tracing through. the circuits it will be found that the lights always receive proper voltages to indicate the permutation set up on the A, B, C, D, E relays. For ex- ample, in the first (identity) permutation^ A, B, C, D and E are all operated. It will be seen that the eight center foints between pairs of lamps receive the following voltages: indicates floating) + « + - . Hence the diagonal series of lamps - 19 - W - - - - X - - - - Y - - - - Z will be lighted. Note that the lamps connected to floating points receive half voltage by a sneak path through the two lamps in series. This is not sufficient to illuminate them perceptibly. Another permutation indicating light circuit has been provided for trouble shooting and for better observation of the machine while in action. This consists of twenty-five small neon lamps. Twenty-four of these correspond to the twenty-four permutations of the variables. These are ar- ranged in a rectangle six wide and four high. In operation without short cuts, these lamps light sequentially from left to right across the first row, then across the second, etc. In short cuts due to the Fq and F^^ relays the whole pattern of twenty-four permutations is skipped. In short cuts due to F^ and Fg a horizontal row in this display is skipped (only the first lamp of the row going on). The circuit controlling these lights consists of a tree on relays A and B which selects the row and a second tree on C, D and E which selects the column. Only the lamp at the intersection point will go on. Sneak paths through other lamps all involve at least three lamps in series and the voltage is not sufficient for breakdown of such a series combination. The twenty-fifth lamp is connected to light up if the C, D and E relays get into either of the two other pos- sible states which do not correspond to permutations in the regular sequence of operations. It can thus indicate certain trouble conditions. Other Modes of Operation With the mode of operation switch set in the P pos- ition (periodic), the machine does not advance the tape after the sequence of permutations and negations but periodically goes through the tests on the function in the memory. In this switch position the path to the H relay, which ordinarily ini- tiates the tape reading process, is open. This mode is some- times useful for trouble shooting. - 20 - In the S position ( step~by~step) , the machine tests a permutation and then stops until the Run switch is operated and released. The path which normally puts ground on the relays F^, is opened and replaced by a contact on the Run switch connected to a condenser. When the Run switch is off, this condenser charges, and when pressed for a step in the oper- ation it discharges through F . F^ or Only enough charge is stored to operate these relays once. For the next step the Run switch must be released and pressed again. In the L mode (low-speed), the machine operates as in the normal mode except at a much lower speed. This is achieved in a fashion similar to the step-by-step operation but with the function of the Run switch replaced by relay N. The N relay is operated by the G relay which is connected in a relaxation oscillator circuit using a gas tube. The conden- sers charge up sufficiently to break down the gas tube which operates G, closing its make contact and discharging the con- denser which then starts recharging. This slow oscillation of G causes N to oscillate slowly which, in turn, allows the solution to proceed at a slow rate. In Mode Q ( self- restarting) , the machine does not stop at a solution but rings the gong, pulses the message register, and then proceeds to the next permutation or nega- tion in the sequence. When a solution is reached in this mode, the operation of relay L causes the message register to operate. This releases relay £ which releases the message register and also applies voltage to slow-operate relay G. Operation of G energizes N, which in turn advances the permutation sequencer one step and also energizes K, K locks in releasing G and in turn, N, and the solution proceeds. This mode of operation can be used to find all of the solutions to the given problem, rather than just the first one. C. E. SHANNON E. F. MOORE Att: Appendices A and B Photographs 214140 through 214143 Figures 1 through 5 - 21 - Appendix A Main Components and Their Functions Relays and Other Electromagnetic Components M_ M 1» M 15 w x , w 2 , w w 4 V \ J 2* T 4 V V Z , 3 \ *!• A3, \ B l- B 2 , B 3' B 4 °2. °4 D r V V D 4 V V V \ Vw W x Z x w z Z z> W a z a W c Z c» w d z d Vy w b z b v e Z e Memory relays. These register the values of the function read off the tape for its sixteen possible states. If M. is operated, the function is closed in state ie Four parallel relays (to give sufficient contacts). These relays negate the vari- able ¥ of the tape function. This is done in the negating and permuting net- work by interchanging the eight leads corresponding to the variable W=l with the corresponding eight leads for which the variable W is zero. Similar negating relays for the variable Similar negating relays for the variable Y. Similar negating relays for the variable Z. Permuting relays. The function of these relays is to permute the sixteen lines from the memory relays according to the various permutation of the variables W, X, Y and Z in the tape function. By suitable combinations of operation and release of these five sets of relays, the interchanges corresponding to any of the twenty-four permutations are pos- sible. WZ relays arranged in a counting circuit to go through the 384 permutations and negations applied to the sixteen leads in the permuter. These WZ; pairs control the preceding W, X E relays, thus W 1 , W 2 , Wj, are controlled by the relay of the ¥ w pair. - 22 - Appendix A (Continued) F 0» F 7» F g» F 15 Failure relays. Operation of F Q , for example, corresponds to failure of the permuted line coming into switch to match the value on input switch Iq. Operation of a failure relay causes the machine to proceed to try another per- mutation or tape function. F !> F 2 These are failure relays which are op- erated by a failure to match on any of the other switches not taken care of specifically by F Q , F 7 , Fg or F^. F 3» F g» F -i6 Secondary failure relays. These are y operated by the preceding failure relays and sort out the type of short cut (if any) available. F^ causes the permuter to advance to the next negation (skipping all permutations of the current negation)* F^ causes the permuter to skip the current subset of six permutations out of the twenty-four, advancing the AB part of the permutation one unit. F^ causes an ad- vance of only one in the permutation. a » R i» R 2' fi 3» R s These relays are controlled by the five fingers of the tape reading mechanism. For example, a hole in the 2 row of the tape operates R 2# Rq, R^, R^, R~ carry information to the memory relays Mq, and also to the number of state re- la 7 s Yi> v 2» V 4» Y B° E s marks the end of data relating to one function on the tape . S l* S 2* S 3* S L Steering relays. These relays steer, by means of four trees, the tape read- ings on Rq, R^, R 2 , R3 into the memory relays and the number of state relavs V l> V 2» V V - 23 Appendix A (Continued) V»r \2 z -2. S^^^fi^rSUlHr. sequence the steering for successive lines of tape into the appropriate memory and number of state relays. V,, V" 2 , V , Vg, V l6 Number of state relays. These relays * register in binary form the number of states for which the function currently in the memory relays is closed. W S Z S A WZ pair for operating the card dis- play unit. It causes successive func- tions on the tape to operate alternately the right and left solenoids S r and S, of the display unit. S r , Eight and left solenoids of the display unit for releasing cards one by one from the stack. H End-of-permutations relay. This oper- ates when the machine has tested all permutations of the current tape func- tion, and initiates analysis of the next function on the tape. I» Success relay. This operates when the machine finds a solution to the prob- lem. Q Short cut eliminator. "When operated, this relay eliminates short cuts in the premutation sequence. J A delaying and checking relay in the basic closed loop of the system. J operates when all of the WZ pairs in the permutation counter are in agree- ment. SO Slow-operate relay in a buzzer circuit for producing pulses to step the tape via relay U. U Secondary relay operated by SO. - 24 - Appendix A (Continued) Reed relay in a slow relaxation os- cillator circuit for controlling low- speed operation via secondary relay H. Secondary relay controlled by G. Control relay relating to low-speed and self -restarting modes of operation. Message register for counting solutions to a problem. A relay for connecting the 110 volt supply only when the 24 volt supply is on. A bell operated by L which sounds when a solution is found. A five-hole teletype tape transmitter. The standard functions are arranged on tape in order of increasing numbers of contacts. Appendix B Manually Operated Switches Problem input switches. These switches have three positions, "open," "don T t care," and "closed," and are set to cor- respond to the desired characteristics of the circuit to be designed in its sixteen states. Mode of operation switch. This is a five-position switch which determines the mode of operation of the machine. In clockwise order these modes are: - P = Periodic. It continues cycling through the same permutations with- out advancing to the next function. Q = Step-by-step. In this mode the machine tests the permutations one at a time under control of the key switch. This switch must be pressed once for each permutation. N = Normal operation. Runs at regular speed to the first solution and then stops. S = Self -re starting. At each solution, it rings the gong and adds a count to the message register, and then advances to the next solution, L = Low-speed. Similar to normal, but at low- speed for demonstration and test purposes. Short cut eliminator. In the "On" po- sition this switch operates relay Q and eliminates short cuts in the per- muting sequence. Next function button. Pressing this pushbutton operates relay H, causing the machine to advance to the next function on the tape, omitting any re- maining permutations of the current function. - 26 - Appendix B. (Continued) Starts the machine operating by closing its fundamental operating feedback loop. Turns power on for the machine. Both of these switches have seventeen points labeled, 0, 1, 2, 16; the Min switch has an additional point labeled "Normal". In use, the Min switch is set at the number of states for which the function to be designed is closed. The Max switch is set at this number plus the number of "don f t care" states. The machine then skips functions from the tape whose number of closed states do not lie in this range, thus shortening the solution time. If the Min switch is set at "Normal" this shortening feature is eliminated. April 3, 1954 Both experience and intuition suggest that a function of time f(t) which is bounded in amplitude range ( |f (t) |<A) and in bandwidth (the spectrum vanishes for angular frequencies etCo, and that there is a certain minimum time required to go from a maximum negative to a maximum positive amplitude. In- deed, one feels that the maximum slopes, and higher derivatives, and the fastest rise times will occur with a sine wave having the highest allowed amplitude and the highest allowed frequency. This note establishes some theorems of this general sort. Theorem I : Let the function f(t), of integrable square, be both amplitude limited and band limited: |f(t)|<A all t greater than <a Q ) has bounded slope, a bounded second derivative, F(») - where F(«) is the Fourier transform of f(t) Then f»(t) < A« f"(t) < A« 2 all t f^t) < Ao) ] n 2 Proof ; If we can prove the theorem for a particular t, it will follow for all t r since we can shift f(t) along the time axis without affecting the assumptions of the theorem or its conclusions. We will prove the theorem for the particular time t^ - Now apply the sampling theorem of f(t), expanding it in terms of its samples: f(t) - 2 aj, sin Sa£ -oo <i) t-nn ft(t) m °P ; [<o ((o t-nn)cosco t - <o sinco t] -oo 2 (w t - ntr) ■ since the absolute value on a^ makes all terms positive. Now is the value H£ f (t) at t - §2 ^ consequently o l^l 5 A » Hence o ~ **{n-l/2)< ±Zfl 2 1 ( n -l/2)2 This proves the desired result for the first derivative. The results forl.higher derivatives can be obtained inductively, f» (t) is band-limited, of integrable square, and, as we have just shown, amplitude limited to Aai Q , Hence, f" will be amplitude limited by: f»{t) < (Aw o )<o - Ao) Q 2 and by obvious induction f< n) (t) < A£0 o *> It will be noted that these bounds are the maximum derivatives that would be obtained for a sine wave of the highest allowed amplitude and frequency, f(t) « A sin o> o t. While such a wave does not satisfy our integrable square as- sumption, it is possible to approximate the bounds given as closely as desired by taking a sine wave of nearly top fre- quency and nearly top amplitude and multiplying it by a very slowly decaying function of the type s *** kt (k very small), let This produces a function satisfying all the conditions with maximum derivatives approximating to the upper bounds given. Consequently these bounds are the best possible. - We now consider the problem of total rise of a function over an interval. Again we would conjecture that the shortest time for a rise from negative peak to positive peak amplitude would be obtained by use of a sine wave of the greatest allowed frequency and amplitude and hence would be nto Q seconds. We have not been able to prove a result quite this good but will show the following: Theorem II : Under the same conditions on f (t) as in Theorem I, it takes at least 3 1/12 w seconds for f (t) to change from -A to +A. Proof : We will show that if f(o) - -A, and f(t 3 ) - +A, then f 1 it) for < t < t_ lies always under or on the ~ ~ 3 curve g(t) shown in Figure 1, This curve consists of 2 five sections, a straight line segment of slope Au3 Q , a parabolic segment whose second derivative is -Aa) ^ and which is tangent to the first segment and to the third segment, a horizontal straight line at height Ao) Q . The last two segments are reflections of the first two. In the first place, if f(o) - -A, then f'(o) - 0, for f (t) is an entire function because of the band limita- tions, and if £} (o) were not equal to zero, f(t) would run outside its amplitude limit A in the neighborhood of zero. Now t f»(t) - f»(o) + J f"(t)dt < + J |f«(t) |dt t < Aw 2 dt - A» 2 t . - 5 - Hence f 1 (t) lies under or on the sloping straight line section. Also f»(t) < AVjj^so it lies under the horizontal segment. Next we show that it cannot lie in the small triangular shaped region T. Suppose in contradiction that f 1 (t) did lie in this region, passing through a point p at t - t as shown. At t Q we have either (A) f"(t o ) > g T (t ) or m f°(t ) < g'(t ). Assume first case (A). We may write t 2 t 2 f»(t 2 ) - f'(t Q ) + (t 2 -t ) f»(t ) + J I f«'(t)dt dt. (1) *o *o We also have t 2 g(t 2 ) - g(t ) + (t 2 - t Q ) g«(t ) + J J g»(t) dt dt. (2) The three right-hand members of (1) dominate the corres- ponding members of (2). f»(t e ) > g(t ) since we assumed f»(t ) in the triangular region. f?(t ) > g»(t e ) since we are assuming case (A). f m (t) > g"(t) since the g curve has the greatest negative second derivative allowed by Theorem I. We conclude that f'(t 2 ) > g(t 2 ), and the f» curve is over the horizontal line at t 2 , a contradiction which excludes case (A). A similar argument applies to case (B) working back- ward to the point t^« In equations (1) and (2), read t± for t 2 and notice that the coefficient (t 1 -t Q ) now becomes negative • This allows the same argument to go through with the condition reversed on the relation of f"(t ) and g T (t o ), and the resulting contradiction excludes case (B), which shows the impossibility of a curve in the triangular region. An exactly similar argument working backward from t shows that f»(t) must lie under or on the right-hand sloping line and curved segment. Now if f»(t) is always under g(t) under gH). In order that f(t) run from -A to to +A at t^ the area under f « (t) must be at least 2A and hence so must that under g(t). A simple integration of the g(t) curve shows that this requires t 3 > 3 1. This proves the desired result. It would no doubt be possible to improve the value 3 ^ by more elaborate arguments of the same general type, finding better g(t) functions with properly banded values of g m (t), g iv (t), etc. It seems difficult however to obtain the conjectured value by this method, C. £. SHANNON Fig. i e.f -5. Bell Telephone Laboratories incorporated Cover sheet for Technical Memorandum subject: Concavity of Transmission Rate as a Function of Input Probabilities - Case 2067o* COPIES TO: CASE FILE DATE FILE AREA CENTRAL FILES (4) i - HWB-WOB-JBF 2 - H. W. Bode 3 - W. R. Bennett 4 - H. S. Black 5 - c. A. DeSoer 6 - E. N. Gilbert 7 - R. E. Graham 3- D. W. Hagelbarger 9- J. L. Kelly 10- S. P. Lloyd 11- L. A. MacColl 12- B. McMillan 13- E. F. Moore 14- J. R. Pierce S. 0. Rice 16- D. Slepian mm- 55-114-23 date June 3, 1955 author C. E. Shannon FILING SUBJECT _MUS£1£NED BY AUTHOR ) JTH'S COPr f 0R Information Theory ABSTRACT The following theorem is proved: In a discrete noisy channel without memory the rate of transmission R is a concave downward function of the probabilities P^ of the input symbols. Hence any local maximum of R will be the absolute maximum or channel capacity C. Concavity of Transmission Rate as a Function of Input Probabilities - Case MM-5 5-114-23 June &, 1955 MEMORANDUM FOR FILE Theorem : In a discrete noisy channel without memory, the rate of transmission R is a concave downward function of the probabilities P i of the input symbols. Hence, any local maximum of R will be the absolute maximum or channel capacity C. Proof ; We have R = B(y) - H x (y) - -2 Q A log Qi + 2 where the Q.^ are the probabilities of the various received symbols and a£ is the conditional entropy of the received symbol when the transmitted symbol is the i-th one. A condition for concavity of R is that — = R.. be a negative semi-definite form.* We have |f - -f 1 ♦ log 9i ) Pj U) ♦ a } J using the fact that Q i = Zp^p^i). H<v - ~Z - i p,(i) p fi) *See "Inequalities, " Hardy, Littlewood and Polya, Cambridge 1934, p. SO. 2 R AP.AP. = -2 2 Ip.(i) p (i) AP AP jk J J 1 ijkQi J k j k ,-2^(2 P .(i)AP.)(Z p k (i)AP k ) (1) - j£Si. iQ i This displays the sum as necessarily non-positive, since all terms are non-positive, and consequently shows that R^ k is negative semi-definite and R a concave function. The simplicity of the formula (1) for the second derivative of R in an arbitrary direction is quite striking. A corollary to this result is the following! Consider the set s of points (P lf ? 2 , P Q ) with 2P i - 1 for which R has its maximum value. Normally, of course, there is only one point in the set, but in other cases it is not so limited. Our theorem allows us to deduce that s is always a convex set of points, for if R is maximized at (P^, P n ) and also at (P», P f ), it must clearly have the same value at (aP + in - 1 - (l-a)PJ aP n + (l-a)Pjr). C. E. SHANNON so STEERING CIRCUIT ^■S-tHH- ~°ws j wiT 1L TnT CARD DISPLAY CIRCUIT MEMORY RELAYS ° 1 T T TELETYPE CONTACTS {Eh * KEh * -Ehi- TAPE INPUT CIRCUIT Run SCE NUMBER OF STATES COMPARISON CIRCUIT 1 3 a 9 - 5 , ,, 11 9 ?9«JO»'io' Z M, Rj Z«! PERMUTATION- NEGATION CIRCUIT INPUT SWITCHES SHORT CUT EUMJNATER LAMP INDICATIONS 110 v DC NE 2 NEON LAMPS 77w 15, "5] "51 ol "5] "5 eV\e o NCOH LAMPS FOR INDICATING SEQUENCING CIRCUIT OF SYNTHESIZER Fig. 2 B- 362338 in o a. In 5w c C »/> o o J- c ^ c d .2 c O O- 'Z — •< — u i; a) Lam dica 3 d u E o 1 .- u <y O QJ Z CD c CL a. H CP TI T.*., Tune. 30,H5^} B- 362340 *- o 0-.C QT CC or , or cr nun o o o o o o o o ro * CP 8 I°H £ >< > £J co X X X X ® £ X > N cr • — i Ll. "D.T ft., Tune 30, \S54 8-362745 " & OPEN FROM 1 » DONT PERMUTER DONT CARE CARE ' - ' CLOSED* © TO +■ WHEN OPEN S TESTING • • « m CLOSED \ T0 - WHEN TEST I N6 Fig. 5 D.T.A., June $0,145*4 Of] A SKELETON KEY TO THE IBFQRKftTION SEMIHAB - gOTES The material in these notes has not for the most part been published and is for personal use only. The notes are not complete. Several key sections are not yet available, consequently there are a number of forward and backward references which are quite meaningless. The remaining sections will be handed out as soon as avail The parts of the notes now available are not arranged in the correct order for easiest reading. The following rearrangement of sec- tions should be made: Some Useful Inequalities for Distribution Functions - p. la - 3a ^ A Lover Bound on the Tall of a Distribution - p. ly - 9y u-^ A Combination Theorem p. lm I — Some Results on Determinant s p. lb - 3b Upper and Lower Bounds for Powers of a Matrix with Hon-negatl,ve Elements The ffumber of Sequences of a Given Length Characteristic for a Language with Indepedent Letters The Probability of Error in Optimal Codes Page with figures 1, 2 and 3 Zero Erro r Codes and the Zero Error Capacity p. I4- 6g ^ Theorem p. lh - 3b. U<- Figure 4 Lower Bound for P pf for a Comp letely Conne e^* Ch«nn ? T yi^ p. 2r - 3r ad for f & p. lk - 5k Application of ■Sphere-packing" Bounds to Feedback Case - p. lp - 3p Theorem p. lq - 4q^ Theorem p. 1J ^ A Result for the Hemoryless Feedback Channel p . i r \^ Continuity of P p ppt as a function of transition probabilities - p. le Codes of a fixed composition p. If Relation of P^ to n . It - 2i BpUBl or P g for Random Code by Simple Thres hold Argument - si - eki^ A bound on P e for a random code p. Id - 3d ^ - 2 - The Felnstein Bound pages 11 & 21 Relations Between Reliability and Minimum Word Separation - p. l2 ( 22 , 62 & 72 Inequalities for Decodable Codes p. In - Jn Convexity of Channel Capacity as a Function of Transition Probabilities p. lo L*-" A geometric Interpretation of Channel Capacity p. lx - 6x ^ Log Moment Generatin Function for th» Sqpm -e of a Quassian Yariate p. p 1 - £2 L- TTppar Bound oix for Gauss ian Channel by Expurgated ' Random Code p. si - f2 Lower Bound on P^ in Gaussian Channel by Minimum Distance Argument p. al - a2 " The Sphere Packing Bound for the Gaussian Power Limited Channel p. c 1 - e 5 The T-terminal Channel p. .fl - 67 Conditions for Constant Mutual Information p. 1066 Simple Proof p. 1024 The following errata have been found: p. ly line 10 > 1 line 11 for any positive <^ line 14 ^(1 - e p "- p. 2y line 8 V, <Y 2 <. . . . 7 % p. Jw - lines 1. 2. 4, 7, 8, 9, 13. 17 subscripts on $ should be in line. 1 p. 2c - line 7 * log Prob n p 4c - Eq. (7) E( 8 ) - -^(s) log - - (ji - su«) Eq. (8) R(s) = £^(8) log q i (s)° 1 » n - («-l) line 6 dE , dR ^ - n' + six" + n' - . s ds ' ds * n 1 + (1-s) u M -u' ~ line 2 E(l) « j log p^ 1 + log d - 3 - page 3| - line 3 - log min. jT page J*g - line 9 change mar. to min. Fig. 4 bottom line - change 3 to 2. page 5K equation (l) min V 1°U = 1 I would appreciate knowing any further errors of any sort that are found in the notes. I expect there a good many there. I wculd also be interested to know of any parts that are particularly difficult to follow and perhaps need rewriting. Claude E. Shannon 0*fj Bounds or- the Teiis of Martingales and delated Questions Claude B. Shannon Department of Electrical Engineering Department cf Mathematics and Eeseareh Laboratory of Electronics Massachusetts Institute of Technology Cambridge, Massachusetts This paper is concerned with the problem of overbcunding the proba- bility that the sum of n dependent random variables exceeds a certain quantity. Certain restrictions are assumed concerning the distribution of the ith random valuable :n conditional on the preceding random var- iables. As an example, v;e might have a gambler plgying some * system K in v/hieh m is his winning cn the ith bet. Suppose he can choose any distribution he desires for x i conditional on the preceding plays, -"'^i i 3: j-~ " " i-Z' subject however to the conditions 1) it is a K fair K bet, S(x. !x.._, , . . . , r^) = Oj 2) there is a R house limit" on passible wins or losses for one bet, . . .,x, ) = for < L and Pixja^, n^ gt . x^ * 1 for sc.. S> W. It is desired tc find an upper bound on the probability that the gambler's winnings will exceed a certain limit X after n bets. This bound will of course be a function of L e Y: s n and K but is to be independent of the system used. Thought of another way, we can imagine the gambler mapping out a strategy, subject to the house rules, to try to maximize the probability I of ending up after n bets with a total winning of X or more. If this is his object, he would clearly be wise, for example, if he ever reached the level X to not risk any future loss. This he could do by choosing a distribution function thereafter which is for negative s and 1 for positive s. We will find a bound for this problem and various other similar problems with different side constraints on the allowed distribution functions. The results have applications in various problems related t- random walks, gambler's ruin problems and certain coding problems in information theory. In the example above, the gambler's total capital forms a martingale because of the R fair bet" condition. Bounds on the tails of martingales are known in terms of the variances of the successive amounts won. The bounds we obtain are in terms of conditional moment generating functions. As such, they require more in the way of restrictions on the distributions' (for the moment generating functions to exist), but give tighter bounds. Our bounds bear the same relation to the variance type bounds for martingales that the Chernoff bound does to the Chabycheff bound for sums of independent random variables. The Main Inecuality The method we use is based on a bound for the tail of a distribution due to Chernoff^'. Lei P(x) be the distribution function of a random res e S:: dP(x) exists ever some % interval including the origin in its interior. This 2 will certainly b'e true, for example, if P(x) < e E:: for some a > and sufficiently large negative x, and 1 - P(s) < e for Some positive b 2nd sufficiently large positive x. We first derive a somewhat generalized formulation of the Cherncff bound. Let u(s) * log v(s) be the semi-invariant generating function. Lemma 1; Suppose the semi-invariant generating function {i(s),for a random variable x, exists for & < s < b and does not exceed another differentiable function of s t ^(s). Thus f /.£s) * !-Us). Then fi (s)-S:- f {s) Pr[:^r,y s )l « e ° " ° b^s>0 r-r[;:^( S jj <s e ° ° -e « s « G This result is like the Chernoff bound except for replacement of u(s} by an upper bounding function ^(s), and may be proved by similar means. Thus by the generalized Chebycheff inequality s y / * cc e X Pr[x5*X] « f " e S "dP(x) s * : f°° c- sx dP(x) = v(s) = e^ s} v-00 Ms) *e ° his is true for any X. Set X = h£(s). Then e ° A similar argument gives the dual inequality for negative ». We now develop a formula for the momeat generating function of the sura of c set of dependent random valuables } x = X] * ^ + . . . f ^ , vhere the distribute function of r_,, ..., Zr is given by P(z I' V ' ' * ' *n } " F *i^ 2 . s 2 *«y .... s. r <aj It is cs assumed that for this multivariate distribution the moment gen- err irz Z functus for various random variables conditional on others euisi. To avoid notations! eomp-emty we carry out the only for n - *, using ;:, y and * for the three random variables, but the method is clearly general, id v(s) is the moment generating function for the sum variable u « s + y * 2 , then (all integrals are from -co to »); = / eS:: dP(r) J dP V|^3 j* e SZ dP(s| Xj y) The innermost integral is the moment generating function for s condi- tional on s and y. and may be denoted by v^.y) (the 3 referring to the third variable, z). Thus Suppose now that we have a bounding function for ^(efx.y), say Y ( s ). 4 independent of x and y. v 3 (s|x,y)< Y3 ( S ) Then the innermost integral may be bounded by ^(3) and .Ms term taken out of the integration. is ciearly non-negative. being an expectation of e Sz .) Thus Ws)^v 3 (s) J e Si dP(:0 Je S y d P(y[ x ) Similarly, suppose the moment generating function of y conditional on x is bounded by y (s) v 2 (s|x)= j" e ^dF( 7 J x )^ Y2 ( S) and the moment generating function of x is bounded by Yl ( s ) e Sx dP(x) < Yl ( s ) Then these may also be ,sed to bound the integrals, giving WiJ « Yj(s) v 2 (sj y 3 (s) Taking logarithms the semi-invariant generating function u(s) for ' the sum variable u is therefore bounded by the sum of the logarithms of the v<s) functions, iiat is, by uniform bounds on the conditional semi- invariant functions fo the different variables l4s) £ ^(s) t ,i 2 (s)+ ^(s) 5 The same argument carries through for the sua of any number of ran- dom variables and may be summarised as follows. Lemma 2: The semi«invariant generating function jj.(s) for the sum of n random variables is bounded by where ^(s) is a uniform bound on the Semi-invariant function for the ith random variable conditional on the first i— i; f sx. log J e 1 dP(x. |s lf s 2 , .... s^j) * (j..(s) . In most applications the same bound, say p. Q (s) s will apply to all the random variables. In this case ^(s) <S nti Q (s). Combining Lemmas 1 and 2 we obtain our first main result, a bound on the tail probability of a sum of dependent random variables provided the conditional moment generating functions exist. Theorem 1: If u is the sum of n dependent random variables Xj(i*l, Z, . . . , n) whose semi -invariant generating functions conditional on preceding variables n^sjxj, .... exiist and are bounded by dif- ferentiate functions ^(s), (i=l # 2, . . n) then Pr[u*Su|(s)j « e 1 1 s * Sti.(s)-sZ}i'(s) Pr[u«2|^(s)3 ^ e x 1 s < 6 Applications In applications of this result we would generally attempt to find the smallest bounding functions ^(s) in order to obtain the tightest bound on the tail probability. As a first example consider a gambler allowed to choose a wager with an arbitrary distribution function ctfx) (the probability of gaining x or less), subject however to the following conditions: 1) The expected gain is 2ero. J" xd$x) ~ 2) <Kx) =s ^(x) where ^(x) is a distribution function with negative mean for which J% Sx d^(x) exists for some negative s. 3) <Kx) 5> <> 2 (x) where <j> 2 (x) is a distribution function with positive mean for which /e sx d<|> 2 (x) exists for some positive s and ^(x) < ^(x). Thus our gambler is allowed to choose a distribution function at each wager lying between two given curves ^(x) and ^(x) t (as suggested by Fig. 1) Fig. 1. which approach and 1 with a certain rapidity. He is also constrained 7 to choose a distribution function with zero mean. The situation described earlier involving house limits is a case of this type where the distribu- tions $j and 4> 2 are step functions at L and W, the maximum allowed loss or win per wagar. To apply the theorem we need a function which bounds the moment generating functions which he can achieve with these restrictions. Con- sider the distribution function A (s) defined as follows: t> G (") ~ $j(x) x < a <?> (x) ■ k a =S ^ p ♦ D U) = 4 2 (x) x > p where a is the first point at which ^(x) reaches the value k and (3 is the first point at which <J> 2 (x) reaches k. tfx) is a distribution function, and by adjusting k we can clearly make the mean of the distribution <j>(x) equal zero. With this value of k we will show that the moment generating function for any allowed ${x) is bounded by that for A (x). Since $(x) and <|> o (x) have the same mean (namely zero), we have, integrating by parts, o = f x d(* o (*H<*)) = ^(xHKx))] 00 - f°° 4 ( x )-cKx})dx -00 e/~co \ / dx = where we use the exponential approach of * and 6 q to and 1 as x goes to -co and -fco to insure the vanishing of the term 4* UH(x» at these limits. 8 Now consider the' quantit-f f a ~H« „ c . q 2 - Us -in using integration by parts) £ °" «*«HM> - - • f e- to A -a « s « b a md b ^ e shs iimi££ of ^^^^^^ ^ ^ 'unctions an, a is *. «r St paiat 8t ^ ^ ^ ^ , horizontal se^nt of ^, ^ ^ foy ^ ^ ~ ^ ;7 tly * (w — v?- £ - »>•• - « u ( or ,. ro , I he first terrn -s / «,s*r.L r i , „ . r 6 J-o is greater than or equal to _ A* S **J*H<*ndx. since, when s is positive, e * 5 > e s * for < x < 6, $ - $ is positive and the coefficient - Y UJCien£ s » negative. If s pos ltl ve. fa „ stoUar way _ lhe aecQnd . * neater than or equal l0 _ 3 p ^ " " J 6 ^ ^HMl <* J 6 IV 1 '-^ 2 )! as one verities by examxnation of the two cases s » o and s < „ „ • Q s * remembering that 4 ( x) - » native or ,.ero in this range. Thus we concha ° = - se 5s r cs e6S [* (xh«(s)J dx ■se a •'-co 9 In other words, the moment generating function for the distribution 6 (x) dominates that of any other distribution with the same mean as <j> and bounded by the ^ and <j> 2 curves. Therefore the moment generating function for A may be used in our bounds for the tail of a sum distribu- tion if the individual conditional distributions satisfy this type of restric- tion. Using this bound on the conditional moment generating functions in Theorem 1 our solution may be summarised as follows. Suppose at each play of a game the distribution functions available to a gambler all have zero mean and lie between two functions 6j(x) and d> 2 (x). Let 4> Q (x) be the zero mean function consisting of 4>j followed by a flat segment, followed by 4> 2 . Let yis) = log J° e 3x d $ o (x», Then the probability of his winnings after n wagers exceeding n{x»(s) is Pr[u»nu«(s}J < e n[fi(s) " s ^ (s)] s » This same bound applies, of course, also with a semi-martingale condition, that is, if the gambler's expectation is only required to be non-positive. If 4^(0) ■ 1 and <j> 2 (0) = (so the gambler can play a wager that amounts to stopping the game, that is, a distribution which is a unit step at zero), then this same bound applies to the probability of exceeding nn'(s) on any of the first n trials. This is because the bound covers ail strategies. 10 Any particular strategy could be modified so that if the gambler reaches the level nfi'(s) at any time before the nth trial he then effectively holds his winnings by playing the distribution with unit step at zero. The bound must exceed the probability of exceeding the level njx s (s) for this strategy at the nth step but this is a bound on the probability of ever exceeding the level in the first n steps. This device can be used in many applications of the method we ar? describing, provided only that the unit step at aero is an allowed distribution function. The bound given, while certainly not the best possible for all values of the parameters, is, however, best possible in the coefficient of n in the exponent. That is, the result would be false if u<s) - su»(s) in the riyht hand exponential term were replaced by \i(s) - sjj. 6 (s) - € for any positive €. This may be seen as follows. The gambler could, within the rules, choose the distribution $ o (x) at each wager. If he does so, then we have a sum of n independent random variables, each with semi- invariant generating' function u(s). Lower bounds on the tail of this sum distribution are known to exceed ^rf'Hu^H] when n is sufficiently large.^ The Case with House Limits on Win or Loss for each Wager For the case of the gambler who can choose an arbitrary distribution with sero mean and house limits on wins and losses W and L (L<0) respectively, the distribution to maximize ji(s) is, from the above analysis, a binomial distribution with jumps at the ends of the interval W and L adjusted to give a zero mean. The two probabilities are W W T at L and 11 To gain a little in generality and siinpiixy notation, consider a binomial with probability p at values L and probability q * {l-p) at W. The semi- invariant generating function is n(s; = log (pe ?L *qe sW ) _ T sL , „. sW pLe + qWe uH&\ = pe * qe The expression for the bound on the tail may be simplified by a change of variables eliminating s. Let na s>L pe X a pe SL + qe sW qe t] 1 - \ = sL sW pe * qe Then A * L c s{L-w, i q i *q H'(s) = XL * t]W 12 u - ap?(a) = log (pe sL *qe sW ) - s(XL^W) = log ( pe a ^qe svv )- L - W lQ g pi. p q = X log ♦ 11 log — Xq (XL^W) Xq Letting p equal ^— and q equal ^rx" and using our result bounding the tail of the sum of n random variables, we obtain the follow ing bound for the probability of the gambler exceeding a certain level after n wagers; W (IF f Pr[u»n(XL*T|W)] <c W - L X » pi tj = 1 - X If L = -W, that is r the win and loss limits are the same, this formula can be simplified somewhat at the expense of a certain weakening. It then becomes Pr[u»nW(l-ZX)3 < "-X -Tf n X ti 1 Let x* |(i+e} f n = -ki-e). Then Pr[u>nW9] * [(Hwef( 1+e )( 1 -e)-(l-e)]n/2 -|[(l+e) In (116)1(1-8) In (1-6)] 83 e 13 Consider the bracketed term in the exponent and expand the logarithms as series. [(1+0) In (lfe)-f (1-9) In (1-8)] * (l*0)(e - ^ * ^ - + ) \i o,y a 2 3 4 . . „y e 4 _ e° \ *\ 2 4 6 " "7 ,f 9 z , e 4 . e 6 , Q 2 e 4 e 6 e 2n ' b ^ 15 ^ °°° ^ 1i(2n-l} * *0 2 Hence -ne! Pr[u*nW9] « e 2 9 Ss It may be noted that this bound is similar to the exponential part of the normal approximation to the sum of n binomial samples,, probabilities '£ at t W p without, however, the coefficient term that would ordinarily appear. This might suggest that the gamblers best strategy to maximize the probability of exceeding nW8 would be to continually play the extreme binomial distribution, or at least until he was within W of it and then switch to a binomial which would just carry him oyer the limit if he won. While this appears to be a rather good strategy, it is not quite optimal h v ■ 14 as a study of small n values reveals. Determining the optimal strategy appears to involve considerable combinatorial complexity. The Probability of ever exceeding a Limit with a Negative Expectatio n Suppose now that the conditional expectation of all wagers Is negative and we are interested in a bound on the probability of ever (in an infinite series of wagers} exceeding a certain (positive} value. If the expectation were srero. then by well known results m the gambler's ruin problem the only bound is unity, provided, for example,, the gambler can play a binomial distribution. With a negative mean, however,, significant bound? can be obtained as follows. We consider the case again where the allowed distribution functions must lae between two given distribution functions «t>j(x) and 4» 2 (x? but now must have a mean m < 0. The maximum n(s) is obtained by the same construction using ^ and <$> 2 „ but with a placement of the horizontal seg= ment to give the mean m. If 4>(0) is 1„ then 4> 2 (0) must have been 1, and no allowed bet whatever will ever give a positive return. Thus clearly the probability of ever exceeding any positive bound is *,ero. We will therefore assume that <K0} < 1. This assumption also excludes <Kx} being a unit step„ since the step would have to occur at the negative number m making 4»(0) equal I. Under the assumption $(0) < 1, the \i(s) curve has the general form shown in Fig. 2. 15 The curve Is convex downward; it passes through zero at s » © with a negative slope m; it has a unique minimum at s = Sj (say), 1 ; and passes through ?.ero again at s q > s y These facts follow readily from the rela- tions = J d<Kx) xe sx d<|>(x} vis) « J* r*(0) = J V(b) = f x 2 e sx d<Kx^ jx^s) . vCs) gfaj - vis) xd<Kx) ■ m nts) * In v(a) jt(0} * »x1s) * ^ v(s) fi 6 {0} * m The numerator of u^s} is positive by using the Schwartz inequality 16 (the unit step which would give zero being excluded). Hence the u curve is strictly convex downward. Also, for sufficiently large positive s, v(s) will exceed 1 and tfs) will be positive, since <j>(0) < 1. Conse= quently, the minimum (lats^Sj and the positive sero crossing at s ~ « o both exist. Suppose we are interested in a bound on the probability of ever reaching or exceeding A with the sums u, - x., u ? * x. 4- x 1 1 Z 1 2 ^ x n « . . . . We have f ° • a % U " n Prfany u >A]< Pr[u >A] n From our above results Pr[u*A] « e n ^ s ^^ for the 8 such that A * n»i»(*). The particular n for which this bound is largest may be obtained by maximizing n[u(s)-sn'(s)] given A = nu'(s), or. in other words, maximising A jj^i - sj . Since »»(•) > this maximum exists and occurs at a unique s found by differentiation, namely, the s for which ji(s) = 0. This s is the s o of Fig. 2. and the corresponding n we call n Q . Thus s Q and n Q satisfy n o n , (s o ) = A In general, n Q will not be an integer, but the bound obtained for evaluation at n Q and s q certainly is greater than that for any integer points. Hence for any particular n. 17 Now consider the Sj where ^(Sj) = (Fig. 2) and n. defined by Again, in general, ^ wiU not be an integer. We let, however, [Hj] denote the largest integer contained in n r Returning to our inequality on the probability of u n ever exceeding A we may rewrite as follows Pr[any u r 2* Aj «* £J Pr[u ^A] n E Pr[u *A] + £ Prfu »A] n-1 [njHl 00 [njj+l < n,e -n s u o x o, e 1 - e 1 <n,e ° <>— o' + _g_ 1 - e <s e a 1 + - e -nj^Sj) 1 - e 1 1 n. + ; — r Pr[any u n 3* A] == e -s A 1 - e 1 s A 1 - e 1 1 rt This is our desired bound. It is essentially exponentially decreasing in A. in fact more refined analysis can be given to show that the bounded term can be replaced by a more involved expression which does not increase with A. References Chernoff, H. U952). A Measure of Asymptotic Efficiency for Tests of a Hypothesis Based on the Sum of Observations. Ann. Math. Stat. 23„ 493-507. 19 7* Some Ussful Ineq ua lities for Biatribvtion, rjaptitsis In this section a number or inequalities trill be riven r»Msh ere useful in estimating the "tails' 1 of distribution functions or' ether related statistics*, Binomial Inaqra litis? s : lat 1 1 (I) Then GBEp-Cj^+j^) 5(^)50 . (2) anc. T rhere t& ••* 1 » A, and neither /. nor ja is sero (I.'ote that if either is zero, G is undefined*} SincMer inequalities hold for the ftcsras of a binomial distributions (^»)p AK q liil , and asay be obtained by multiplying the above inequalities by p'^'q^o They nay also be generalised to the multinomial coefficient: 1 1 G - • ( o i ■ G i e *-» ~ s G i «- (-1 12^> s tt^sti * G i <w T'here s is the number of comoone.nts.5~ .\. * 1 and nana of the \. vanishes „ a i The "tail" of s binomial distribution may be -estimated by the f ollosrin formulas! k»An Akn-k, l ., , , 1 ( k )r q £ 7 : ~~"~7 G t (JjP-c-od»«i - => P+£ (0) it (g)p k q-" ,£s fe X (^)l Voided x.p . (6) The first of these gives a closer estimate of the tail but is somexvhat more complex. The inequality (6) (Chernoff ) is often convenient because of its simplicity. Loser bounds for tails nay be taker, to be merely loner bounds for the first term as in the lower inequalities of (2) or (4) •> We shall n&% prove the inequalities (1) and (2). The Stirling approximation for nl is as follows* It is known that if no terms of the series are taken, nl is underestimated, if only the ^ term is taken, then nl is overestimated, and so on. Ke 'fish to overestimate ni/( to) J(nn) J . This will be done if the numerator is overestimated and the denominator underestimated. Thus re may write fo-%1/2 n + 1/2 -n 1 nl * 12n or tf 1 1 , l i i i t (Xn)i(n«)i " y^?' 7%^ Cl2 ^° 12*n~ 22pn + 360( AnP + ^^3 ) We wish to show that the exp term is less -than or equal to one, or, which is the same thing, that its argument is less than or equal to zero. One or the other of \ 9i i is the greater. From symmetry, we may assume without loss of generality that it is X, that is, X > ji. Then ^I77T5 - "," A and since is a positive integer, — T < -i- c 360(An) a 360( l m)"» 36o({JIl) 3 36q ^ Further, jg^— j^Jf S 0, since Xn < n. Using these, we have ire * ( A - s& - <jfc - rifc> * ° This proves the upper bound (2). The lower bound is found similarly by underestimating the numerator and overestimating the denominator. No terms of the series are used for nj and the ^Lj and ~— for the denomi- Page 3a nator term,, rhis gives directly The other lower bound with -i/tt/2 in place of the exp term is obtainsd by noting first that unless both Xn and un are le3S than or equal to two, the argument of the exponential ( ^>ji ^ i^jji ^ * s " Less than + 35^ * ^ Now exp - ^ > -\fn/2 s and it is also readily verified that for the four cases where both An and on do not exceed \mo, namely (2,2), (2,1), (1,2) and (1,1), that the result is true» The worst case is (1,1) which just gives for equality Hence the result is true in general. The upper and lo^er bounds (h) for the m.\ltinosial are found in exactly the same way as for the binomial. The tail inequality for the binomial is fori_\d by overestimating the tail using an infinite geometric series „ This process is familiar (see, for example, Feller) with g replaced by the t\nomial coefficient,. The inequality (6) is a special case of Chernof f »s i^quality which will be discussed later more generally,, A Lower Bound on the Tall of a Distribution Let n»C«5 be the logarithm of the moment-generating function of a distribution F(x), and assume u.(s) exists in an interval with s - in its interior. Then dF(x) * e^ (s) e " sx dG(x) (!) where G(x) is the distribution of the tilted random variable obtained from F(x) by the e A multiplying operation and normalisation. G(x) has its msan at n(s) and its variance IsT * ^'(s). By the Chebycheff inequality G(u'(s) +^/p"'lsT. ) - G(u'(s) -ok/^TbJ - )>1 ~ 1 for any positive C\ Mow integrate equation (1) from \i' (s) ~ o</p. ! ' is ) - to u, ! (s) +o</ii' s (s) This gives ✓ F(u° ^/^tj - F (u< ^ /prr _ 0) * Je : * dG(x) #' This then, is a lower bound on the probability for the F distribution in a small interval in terms of the logarithm of the moment generating function. If F is the convolution of n identical and independent distributions, each with n(s) for its log moment generating function, then that for F itself is equal to nu.(s). The interval in question is then 2*\/n^ 1 ' while the center position (for a fixed s) grows as nji*(s). If we integrate (1) from -00 to u.» + f //[T rT and assume we obtain an underbound on the tail of the distribution F in the negative direction ,> This gives H» +«f/vT7 F(m» VAPT)^ J e"** 3 dG(x) —00 . , ^ - 8n» + s&ftP~ r dG(x) 7 * If ,? it »p.i convolution cf rj J.dacticfi.1 &istrd£. \i-io:-? each with ifrC'c-> as ths. Ic£3rixbx oi' its coassnrt gcr«ort.t3Jts,t functior.. Thus the a - _"~uaEr«t of T spprosshes r. sy :.;t . :ticn~. by for iarne n the arga~ rant Sty' app&ariag in th-3 Chernof; upper bound. Likewise the exponent on tne richi (;^nd the coefficient 1-3 can also be included as a term in the • exponent) - -preaches as^ncpl-tieslly the expcr^nt ir the Chernof f upper r x>urc,= j -Iocs-: fr^uaHties Slay also be extended to tno cr.se where F is a =onvc!!uticr. si r.ot necessarily identical distribute ons with functions Co) (i - 1, i t - * ft). Then for F itself we have u - V'jj.^ , v i« -2.^ -- d P ' 1 » and those may be substituted in (2) and (3). It is also evident that these same inequalities for &"2Q give a lower bound on the tail in the positive direction, that is, 1 - F(|i' f - 0). lover Bounce on i!ultinonr .»?. Tails and Tei •Suppoee we have a discrete distribution: a random variable can assume values' ^ <£.v 2 <~ - -<(v t with probabiiit -s p.,, p ? , - - »» p . We wish to establish a lower bound on the size of term that can be found in a email interval when this distribution is convolved with itself n times (that is, junns in the distribution of the sum of n independent variables, each with the given distribution) „ We first show the existf-n- oe of a term having a certain sise near the mean cf the convolved distribution. To do this, the following lemma is first proved., ■v- ~~ — ?arts cf these results were obtained in collaboration with Peter Slias. LegBR. i For any given n, we can find integers i^, xi^ t , such that K - ^ 1 C « ) 2 n i * n (2) nZ Pi v. ^^n ± v ± ^ n v ± + £l (3) where A = ^in v , .. - v . „ A i + 1 i Proof j We first find a set of integers at. which satisfy all the con- ditions except v i ■< n p.. v i * A , and will than derive from these the n^. Choose to be the first integer greater than p^n. Set m^ - p^n - 6^. Kext, choose m-. as the greatest integer less th~n p^n and set - p n n - 5 n .. If 6^ - 5^ 0, take another m from the low end (i, e„, m^), the largest integer less than p^n, and then calculate 6 t + + 6^ where 5 2 - - p^n. If this is positive, proceed with p , etc., until the accumulated sum of 6 fc s first becomes non~pos it ive. When this ••'Ccurs J . terms are taken from the top end of the v range (P t-1 > p t-2> etc *) the accumulated sum of 6's goes positive. This process is continued, alternating from one end to the other as the sum of the 6's changes sign, and eventually will end with some index k, having the property that all ^ for i<k satisfy n. - p i n - while for all with i£ k, we have - p^n = 6^ 0. At each stage of the operation, the total accumulated discrepancy satisf iesj^o^^ J £ | . This is true at the beginning, and arguing inductively at each stage we add a 6 of absolute value less than or equal to one to an accumulated of abso- lute value less than or equal to one and of opposite sign. This leads to the next accumulated sum also being less than or equal to one in absolute value. Hence, when the last assignment of is to be made, If we let - n - a^, then we satisfy^m, - n and also hr i / k \ • n -^Z, (np + 6 ) . i / k 1 1 - n -(n - np ) 6 K i / k 1 3 '*k + 9 H^ 1 Thus, | \|*^\ -lso. • Nor since > 5 • C. k have 1 1 h . n -X 6 - r: 6 1 1 n *■ 1 1 where h is the index cf the largest nejwtive & i , (eithar 5,. c* - Multiplying each side by v h and using the monotc. s ordering of t; e obtain % l>: - .h . t t - ^ 6 i -f- 5 i -h ■ £ 6 i 'b**^ 5 i * s J- J. h*l h + i Hence, using the end expressions in the above inequalities r and therefore t . 1 ' np i v i + >: 5 i v i t New starting with the m ± we can construct a set of which satisfy all the conditions of the lemma. Note first that all the 6 for i^ h are positive and for h are negative. If we replace one of the lower ' u ±> sa y m a ( a £h), by the next larger integer + 1 and simultaneously an n^b/h) by the next lower Integer ^ - 1, we retain the properties that the errors in approximation satisfy J 6 ± J £ 1 and that their sun be T.ero (or equivalency, 2k - *0« However, this reduces the value of!>m v x * — i i by an amount v b - v & . Starting with the set cf m ± just derived, we shall show how by interchanges of this type it is possible to go down f :• om the value'21m i Vl by steps none of which is lar-er than A, and eventually arrive at a sum less than or equal to n^P-j^ t ± . It will follow that in this sequence of operations there is a stage at which the third condition of the lenma obtains. T he series of steps is constructed as follows . Perform the inter- change operation on (the last negative n^) and + r Since V h+1 ~ v h^ ^» the chan 6 e the sum due to this change is less than or equal to A. Now in place of this interchange consider that of h against h + 2, or that of h - 1 against h + 1„ The additional change in these cases over that just considered is clearly less than or equal to A, being indeed v n + 2 ~ v h + 1 or T h ~ v h - r ^ 118x1 stage would involve adding to one end or the other of the interval already taken. This again changes the sum from that previously obtained by not more than.d. This process is continued until the ends of the range are reached, that is, v and v t 1 are used in the interchange <, These are nr/n left in the changed state and the process is started again with and + . Working outward from these eventually the nuabers m^ and m t _ are used. These are then left in the changed state (that is at rq^ * X and ffi^ _ - 1) and again the .'.rocess started at and ac.^ + „ This procedure is continued until the permanently changed m's from one end or the other reach e, or a, . so n n + 1 that .Vzrthsr steps of this type are not possible. The set of changed m^'s, si;' ie J | then existing have essentially the reverse property of the original set ; the corresponding s/ (that is - p^n) satisfy 6^ for Il£h : foj a certain h' „ Hence s using essentially the same argument we used in prov.'ng (U), we can show that t i Thus this series of steps has at souc stage given a set of integers such that 0^"*£J^ 6 -Cxd f namely, the integers at the stage just before this sua goes negative, For these we have, equivalently, n ■^-p i v i ^ . y — n v. ^ n -2-^ + A ° ' c his completes the proof of ti<e lemma. Returning now to the original problem, consider the term in the nth convolved distribution where thi value is taken times (i - 1, 2, — t), the n^ being those of the lemma. In the multinomial distribution this gives rise to a term of /n\ n, ? ^ n This inequality is an application of the general inequality proved previously ■ for mult; nomials-, We now wish to simplify this making use of the fact that the n^ are close to p^nj j * j n^ - p^n Consider the last terms in the exponential? \ log P. -27. log 5 , -S^ log (1 . \ } 6 i * ppi (since log (1 +-x)& Pi n (since « o) - -i5T i. The first exponential term can be estimated as follows. ^TFnT 12n^pJ "H7 We now assure that, for each i, p n^l (in other wor^s, that n^ 1 ). . min f0U ° , ' S ^ e " Ch "i? 1 K « - « I -a „. „ „ integer) Srtrt henna I n. - 6 p 5L - _i i i n ± n± n i - 1 + i_ ^2 Thus Finally the coefficient in (5) can be underbounded as follows. n i 6i - 1 / 2 -1/2 7 / ( o^- exp (-?2^i Collecting these various terms we have the following result j Theor em: The sviz of n independent random variables, each w.'.th the sais discre^e distrib r:ion, probability p. of value v^^ (i * 1, 2, - - t) O^j^T v^ p ) has a term in the closed interval frca^pj v i -° ^ P v ^ + & ■ ESac(^ . - v Jand the terra has a value at ieast r . ^ .. e 3n — p . f ^ A p^TTp i provided n ^? p~ ain This result cay be generalized to give a .era of such a dis -ributicn anywl^vY, in the possible range. This is dene by writing the dis :rib vtion in terras of the tilted distribution} the sua of independent random variables v . s — v^s Vth probabilitLib ^(s) - p^ 1 / ^ p^e As we have seen pre iously, the distribution function of the original sum, F (x) , is related ,o that of the tilted distribution function, r ' n '. t ), by the equation dF n (x) - e^ (s) e" Sx dG n (x) n n ' The G n distribution has a term in the internal A - n'-(s) to A + £, since jj.'(s) is the mean and the previous result applie., „ This gives a ,erm in the F r distribution, to the anoxint ctated in the following. Jheor^J ~he sun of n independent random variables, each with the same discrete distri vj'ion, probability p t of value v i , (v^ < v ivi^ (i - 1, 2, - - t), has a term in the closed interval from A to A ♦ ^1 where A » mx ( v , - v . ) and n v . ^A<. nv , The term will lave a ' i i*l i nun ^> max value at least v.s v,s where q^s) - p^e p^e 1 and s is chosen to make A -O^Cs) v i> and provided n^q^ (s). The last term is the Chernoff Vund with ^ v. s min H(s) - log<>p e 1 , ^'(s) - A A Coafcin atorial Theor-en Theorem; Suppose we have a set of objects S.,, S^, oc^S and a nusier of nuriBrically valued proportiee (functions) for the objects ? ia Pg,.*?^ These are aon=negative P. (S.) £ end we laiosr the averages of these properties over the objects: Then there assists an object £^ for vniioh P 4 (S ) <di. i - l f 2, d More generally given any set of K. > satisfying i«i i then there exists en object P i (S ? ) < i - 1, 2, BOO „ d Proof ; The second part implies the first by taking %. - d To prove the second part let l! ± be the cuaber of objects for which P^CS) > K^a^o New A ± > i H ± K ± A ± (sii»e all S »s have P i values > 0) . a Hence R. < — a Ki The total nucber of objects U violating any of the conditions is less than or equal to the sum of the individual N. l M < n ^~ f" - n ^sing ^ i. < 1 Hence there is at least one object not violating any of the conditions <> Sgn^s Resu lts cr> Determinants The root of a determinant equation., Leans: Given f .{») * 1,2, «.) continuous functions of w in the range a < X ■< L and in this range £.. ,{ta) > 0. > f ij^ fc ) > °s '^(a) < ^» f^(b) > d, "Chen there exists W, a < V, < b and a set of X. > Q,TX. * 1, such that i — i ^cof j Consider the d dimensional region P. whose points are (JL, Xd,. W) V7here X ± > 0>jT K ± * 1, a< ?: < bo This is a topological imace of a sphere and its interior c For a fixed W in the range from a to b . consider the continuous rapping ij 1 ^ v * w + 1 f .(iv)x. 1 fj id .1 ix a < V a < b a if f , < a l^bif ?1 > b Note that the denominator for Y^ does not vanish because of our assumption that ^ ^-(6°) > and hence the Y are rell defined Also the Y. are (X^tf) in R continuously into points (Y^V) in Ro Consequently, by the Erouwer fixed point theorem there exists a point (XJRf) which is napped into itself, that is, a point for which (W) - X . ^ (W) s Vi - V„ The value of W for the fixpoint clearly is not a or b since these points are moved upward or downward by our assumptions „ Hence for the fixpoint we have Iff « W + 1 - T" f (W)X, or T" f . .(W)X. « 1„ It follows ij iH that for the fixpoint Let the elements a.. . of a ratrix be non-negative e Suppose there is an eigen vector A all of whose components ere positive, a. > 0, &v6 the 1 * 2, ' corresponding characteristic value is K . fie trill show that for anv c * other characteristic value ^ we have |A_J £ \ . Let B. be a characteristic vector for ^ where r;e adjust the length of this vector as follows., Choose its length in such a way that A. - jB j S for all i and the equality holds for at least one i, say i « h, so that At - JB j It is clear that this can be done since with zero length all components of B are less than those of A and-' increasing continuously, eventually a first one of the jB.. j reaches its corresponding A^. Me now have S>i £ ij * V 5 (1) ^ B i a i. - V 3 ; (2) ^l B il £ i^ \\\ (3) Subtracting these equations for j * h f { V i B il> a ih^ \A,~ N j B h | (h) All terms in the sua at the left are non-negative and also A^ is definitely' positive o It follows that A - jJ^j > The derivative of the eigenvalue of a matrix., Suppose we have the square matrix (a^s)) where the elements are different iable functions of a parameter s. Let V » V(s) be an eigenvalue with corresponding eigen vector A^ - A i (s) and eigen vector B^ - B.(s) for the transposed matrix. Thus ^ i a ij (s) -^ s ^ij] -° a) ^Vij"^ (2) ^Vij" VB i (3) Theorem: ".Bj V(s) - ^- To prove this, differentiate (2 ) with respect to s: s4 a ij + ^ A i a lj ■ ^V va j • Nor/ multiply by and sum on j S" A V ,B ♦ S~ A. a ' E - V' J" A .B . * V y J.B . „ ij J J i3 ^ J J T J Using (3) in the first term cancels the last term on the right, giving the desired result 1 -7 Upper and Lower Bounds for Powers of a Matrix rr: tn iMon-nsgative Lleuanit: th We frequently have to deal with the r. poorer of ^ matrix -.Those ele- ments are fL JJf o We denote the ij element of this n' J ' power by ; ' * lj -j We are concerned here frith the case where the corresponding graph has tbfc property that it is possible to go from any node i to any other by t finite sequence |3. S . r-here all the ? s Ifc this series arc positive- This means that the crap:, consist? of one ergodic or periodic set in the usual ISarkoff analysis. The non-negative conditio- en the f> insures the existence of a real eigenvalue v Khicfc is a solution cf the c determinant equation \\. . - vo. ; * 0. '. urbher* this v dominates in absolute value any other eigenvalue v , that is., y jvjo Corresponding to root v there will exist right and left eigenv — for the matrix 3^ - v o A. l B o - v o E i & The conditions 3 4 „>"0 imply that all the A. be the same sign (or vanish) and all the B i be the same sign (or vanish) o In both cases vre take the- to be positive (multiply by - 1 if necessary) . In the case satisfying the graphical condition it is easily seen that all A i and all are then actually positive (none vanish) . Theorem ; Under the conditions above, i. e. and any state acces- sible from any other through a finite sequence of non-vanishing transitions, fn the element of Jl ^ /j where t is the smallest 'nou~ vaniski i:g ) f^.., and d is an integer such that there is a oath fron: air- stats i to any state j rrith not oc-re than d steps (d - I irtirmsciace states) » Furtnsrcore , there will exist and n such that i rj // ' O provided cither (1) lor some n^ r ~\ _.' ^ for all 1, j or (2) the state aiagrss iu-.i no recurrent subsets (the greatest eoJSfflon divisor of closed path lengths is 1}» Proof ; The first inequality is proved easily by induction on n, For n *■ C, since for i ;K 3» the right menfoer is positive and 6. . - 0, trhile for i 1 .j Now supposing the inequality to hold for- n we prove it for n + l e (n) 5 - 1 and the right reenter is one. rupposing the inequalil , (»*> .To 3 < J <^ p. B* 1 B v n ^ J so -3. » B~ J " v n v 3. J c CI "his is the corresponding inequality for n •> i 3 concluding the proof. The second inequality, that 3,/ E. < (vVB . ) d is shown as follows. From (1) , let some '. , be positive then lp The Nunfoer of Sequences of a Given length Suppose a nunber of letters are available whose lengths (or durations) are a^, a 2 , .„„, a g and we wish a bound on the marker !!(£) sequences of total length /. Here it is assumed that any sequence of letters is allowed, }](£) satisfies the difference equation Ul£) - N(/- ai ) +K(/-»a 2 ) + ... + h(/- a ) as T7e see by noting that each sequence of length £ mist end in one or another of the available letters « Furthermore, the boundary conditions say be taken to be K(/) « for ! < and K(0) - 1. Associated with the difference equation is the folic uing characteristic equation: Since all the a ± are positive cud real, the right-hand member is a strictly monotone decreasing function of X and varies from co to when X goes from to co „ Consequently, tte characteristic equation has a unique positive real root W Theorem s n(g) < ^ 9 To prove this, note first that satisfies the difference equation since this results on multiplying the characteristic equation (vrith X replaced by W) by „ With regard to the boundary conditions, W° « 1 « n(0) and F^C- K(f) when /< C Let a be the scaliest of e^, a 2 , „.., a g „ Then it is possible to proceed by a kind of induction of" steps of i(each of length a) to show that the dominance of Xi £ over N(£) continues for all £. In fact, suppose that for jg*£ we have K(/) < „ Then f or £ in the range X < £^ +a N(/) - N(/- a x ) a 2 ) ♦ ... + N(/- a ) Sinse the inequality is true for/s 0, it follows that is is true for all/. ' A more general problem cf the same sort relates to sequences which are subject to a finite state set of constraints. Thus, suppose there are d states and that in state i, letters of lengths / are permitted, 2p leading to state j The index a ranges over the different letters going from state i to state j and j ranges over the different states v:hich can folic*? state i Now let ?!..(/) be the number of sequences which are* possible and which start in state i, end in state j and are of length / These quantities are readily seen to satisfy the difference equations m £ < The corresponding characteristic equations are A. - A W ai 3 oTi Let V/ be the largest real root (there is a positive real root by a previous result based on the fix point theorem) of the determinant equation: I Y iT^id r 6,, and 1st A 4 be a corresponding (positive) solution of (2), We will assume the graph of the constraints is fully connected so it is possible to go from any state to any other. Then all the A^ are positive (none vanish) „ We will now show that the number of sequences of length £ starting in state i and ending in j, N^C^), is bounded by V>« This is certainly true for £< and also far / - since then both sides are one if i " j t and otherwise the left side is zero with the right positive o We now proceed by the inductive type process as before, assuming the inequality out to some £^ and then show it follows for / out to plus the minimum / . . e ■yO - £ «u <,j> 1 US (continued next page) 3p Thus the inductive step carries the inequality up to £-mJ + ain / . and hence it is true far all £ ^ An Alternative proof that \]<J) < li^ Consider the case of a sequence of letters of different lengths a l* a 2* po ° a g no constraints c We wish to prove that EJ(/) < 9 where W satisfies ^ W^A « 1. Assume, in contradict ion, that for soas £, N(i) > W . Then, since M(0) < ¥°, there is a greatest lower bound of / 5 s,say £% for which the theorem fails . In the interval £*< £ < £*+ J a there must be an /, say for which the theorem fails (a^ is the smallest a ± )c Sybdivi.de the sequences of length £^ into subsets according to the first letter . let the fractional number in the subset beginning with the letter i be f. (i - 1, 2, g) . Choose the subset for which aT 1 f? 1 is a minimum,, In a sense, this means the subset which conveys ohe least information, log f° , per unit time in its first letter , The minimum value of a~ log f J 1 aaong the different subsets is less than or equal to log W B To see this, suppose, in contradiction, that for all i, a*: 1 log fT 1 > log W Then f . < TTH and, summing on i, 1 - f . < £ - 1, a contradiction. Hence the subset chosen will have a^ 1 log f" 1 ^ log I, or f . ■> nT*i. If we delete tho first letter from all sequences in this subset, we are left with a set of more than ifi- ~ a * sequences of length £^ -a^ Thus N( A > * 1 " ai ° S±t»e/ 1 -a i < /* this contradicts the assumption that / was the greatest lower bound of /'s for which the theorem fails . Hence the theorem is true for all /„ Page lc Characteristic for a Language rcit h Inclopsndsnt Lo t tors Suppose vfe have a stochastic process ^nerating a language ccn^ sisting of a sequence of independent letters These letters are all chosen with the probabilities p ± for letter i, i = 1. 2, g We consider sequences of n such letters, that is. words of length n in the language. Suppose that all such words are arranged in order' of decreasing probability from the most probable one, consisting of a sequence of a most probable letters, down to the sequence of n least probable letters,, The logarithm of the probability of any particular nerd is (because of the independence of letters) the sun of the logarithms of the probabilities of the indi- vidual letters o Thus, the logarithm of the probability of a cord is a random variable which is the sum of n independent random variables each with the same distribution function. We may, therefore, apply previous results concerning the tails of such a distribution to estimate the probability in our monotone sequence of all words beyond a certain point « The distribution of log p" 1 for a single letter will bjivc s. moinent generating function 1 1- s i Hence y.(s) » log S 1 x KT~ 1 - s , =1 ii\s) - 1 jr-g (i) Our upper bound on the tail of a distribution then shows that the total probability P T of all sequences whose individual probability P satisfies JTp^iogp- 1 | log P <p.»(s) - (2) i 1 Pegs £c is bounded by H l0 S P T * - s ^s) - log £ p* " s * 1 ' * y P. V 1 -s , s 2_ p< log p. 1 (3) This last expression as well as (1), can be written more compactly in terms of a new set of probabilities ^(s) Ihe relations (2) and (3) no? become, after some manipulation, 10£ P S I^j log p" 1 £ T q^s) log 3y (),) 2. J i i This is one of the results we desire, an overbound on the tail of the distribution of probability for .sequences . We nor? desire a similar bound on the number of sequences whose probability is greater than P. To this end, consider constructing all sequences of length n giving each letter probability i (instead of the probabilities p ± they actually have). We again consider the distribution of the sum of the logarithms of the probabilities (using the original Pi ^lues) for the letters in a word. Note that the sequences arranged in monotone order are in the same order as previously. Under these new conditions the moment generating function V-^s) and its logarithm ^(s) are given by « M^s) » log 2~P? S - log g i * The total probability P 2 of all sequences in the tail of the distribution beyond the sequences whose individual probability P satisfies will be bounded by x t ^ p i lo£ p i n lo e p 2 5 ^(s)- b^Cs) . log ^Tp" 3 + 1 os log g . Tve note first that in this modified probability system (each letter with probability ^) all sequences have probability ~- and c onsequently the number of sequences N 2 in the tail whose total probability is Pg is precisely P 2 g n „ Hence the number Kg in the tail is bounded by ^ log N 2 - ~ log P 2 g n - i log P 2 * log g c- ~s s < 2. Pi lo E Pi ~ log ]>" p° S + 1 r P r In order to compare this result with the preceding one (1;). we must identify the points at which the tails of the distributions are cut off „ This can be done by equating the probabilities P of the individual sequences at the cutoff point. Thus, using (1) and (5) and writing ^ in the latter in place of 6 we have i i This is obviously satisfied by l~s - -e-, and since n"(s) > the left term is a strictly monotone function of s and therefore this solution is unique o The number of sequences now becomes, in terms of the s involved in (1) and (U), ZP^ 8 log P S - i log n 2 < io £ Xp^ s ♦ -Vr= 1 4- ?! Rige i-c Again using the 0^(5) to simplify | log K 2 < T q i (s) log q^r 1 «. (6) Both the bounds (U) and (6) are also the limiting values approached by - log P T and - log Ng as n->oo c This follows from remarks concerning the tails of distributions made in an earlier section „ Thus the relia- bility curve of a source of the type we are discussing here with inde~ pendent letters may be written in parametric form as follows : (7) e(«3 q^s) losqrly - (a/- s^ 1 ) R(s) * q i (s) io e q i (s)" 1 = ^y- ($-0/^ (6) I-s ^here q.(s) * -^r^- . ( 9 ) i The parameter s in these equations is related to the slope of the reliability curve,, In fact, we note that dR ds ' ds */>,/, x "/ v s 1 - s 11 (SM (l-s)ji (s) -Ji (s) Thus, as s increases from to 1, the slope increases monotonically from to oo. It is interesting that at s • 1 the formulas (7), (8) become E (1) " \ Z log Pi f log d R(l) - log d The Probability of Error A problem of importance in ii&crmatioa theory is that or studying the behavior of signaling codes that say be used in encoding an infor- mation source for noisy channel and, in particular, the probability of error for the optimal code c This paper is concerned r:ith estimating this probability of error under fairly gonarai conditions „ V. ; e niil find that, to a large erfceat, the prdblea can be divided into two parts. First, there is a problem relating to the information source only (not involving the channel) which involves estimating the probability of error when the source is encoded into a simple standard noiseless channel The study of this question leads to a certain function which we call the reliability characteristic for the source and which determines, in a certain asymptotic sense when the code blocks arc. Ions, how rapidly the probability of error approaches zero. Second., there is a problem relating to the channel only,, This leads to a function describing, in a sense, the coding behavior of the channel with regard to probability of error when the code blocks are long,, Our final and most basic results show how the two functions may be combined to give optimal behavior (or bounds on optimal behavior) when the source is encoded into the channel „ We will first clarify our terminology, since various writers have used sons of the terms involved with quite different meanings „ For the most part, we will restrict ourselves to a finite, discrete, memoryless channel,, Sucl/a channel is specified by a transition probability matrix |jp ± (J)|| « Here p i (j) is the probability that if input synbol i is used, the output will be j and we have Matrices satisfying the conditions that all elements are nonnegative and the row suns are unity occur often in probability and are called stochastic matrices The input symbols to the channel will be called the input letters , the set of these the input alphabet., The output symbols of the channel will be called the output letters and the set of these the output alphabet. A channel -is often conveniently represented by 5 line diagram of the type shxai in Fig, l c The ciianncl beir.£ memoryloss eeans that successive operations are independent- If the ir.put letters i end j are used, the prdbabiiitv of output letters k ar.d C, rill be p^kjp.. (/J>» * sequence of input letters will be called an input word, a sequence of output letters an output word, A collection of M input words all of length n Will be called a block code of length n, R » 3/n log U will be called the input rate for this codec Unless otheri-ise specified, a code v;ill mean such a block code c. A detection system for a cede is a method of interpreting output words as input words, that is, an association or mapping of one of the input words of the code for every output word of length n. The pro- bability of error for a particular input word is the probability, if this input is used, that it will be interpreted incorrectly c It is, therefore, the probability of that input word being received as an output word which is not detected as the input word. The probability of error for a code is the average probability of error for all input words in the codec An optimal code cf length n is one which minimizes this probability of error (when using its best detection system). These input words iu, u>, . u M need not all be different. Cur main problem is to estimate for a general channel upper and lower bounds on the probability of error- for an optimal code as a function of the length of the code n and the rate of transmission R„ The ideal solution would, be to find a simple explicit formula for the probability of error in an arbitrary channel as a function of the rate of transmission R and the length of the code words n. This is probably too much to hope for in view of the diophantine complexities of optimal codes. Barring such a complete solution, one may still hope for upper and lower bounds on r\ and perhaps results relating to its asymptotic behavior when n is large, idost of the present paper is devoted to this type of result. In studying the asymptotic behavior, it will appear that P e , for a fixed rate R and a given channel, varies approximately exponentially with n. For this reason it is convenient to introduce a new term. If a device cr a system has a probability P of making an error, we shall call -log F ?c.rc 3 the reliability of the device or system. V;e have .lust said in effect that for large n the reliability for optfcal codes varies essentially linearly r/ith n, that is. as E(R) . n 9 whore R is the rate for the coda More precisely, we define E (R ) as follows : E(R) » Lin sup-- log P n e opt n-s-co We will call E(R) the reliability characteristic of the channel and attempt to evaluate it, or where we cannot do this, at least place upper and lover bounds on it„ The writer feels that the quantity we have defined as reliability wiU, in many cases, turn out to be the most appropriate way of measuring s probability of error, In. previous work by von Iteumann en unreliable neuron-type elements and by E. F, Moore and the writer on unreliable relays, the quantity • 3 eg P entered significantly and was the mere natural way to describe some of the results c .In both these cases the reliability varied rather s imply with the redundancy of the error=correc - ting systems o It is a little like measuring gain on a db scale or ion concentration on a pH scale „ While actually little more than a change in scale, the use of these units of reliability in the codii^case threes the results into a much more natural and illuminating perspective „ If we have two given channels, it is possible to form a single channel from them in tero natural ways which we call the sum and product of the two channels. The sum of too channels is the channel formed by using inputs from either of the two given channels with the same transi- tion probabilities to the set of output letters consisting of the logical sum of the two output alphabets. Thus the sum channel is defined by a transition matrix formed by placing the matrix of one channel below and to the right of that for the other channel and filling the remaining two rectangles with zeros- If Pi (5)|| and IJbJO^ are the individual matrices, the sum lias the following matrix j P 2 (l) • • • P 1 (r) . • . P t (D . . . p t (r) • • . • ' • Q p^l) . . . pj(r') P t *(D • • • P t »(r ) lags k The product of two channels is the channel whose input alphabet consists of all ordered pairs (i.i') where i is a letter from the first channel alphabet and i froa the acconel, whose output alphabet is the similar set of ordered pairs of letters from the tsrc individual output alphabets and whose transition probability from (i,i') to is Fig. 2 Page ig Zero Error Codes a nd the Zero Error Capacity C In a discrete channel we will say that tr/o input letters are adjacent if there is an output letter which can bs caused by either of these two. Thus, i and j are adjacent if there exists a t such that both p ± (t) and Pj(t) do not vanish o In Fig c 1, a and c are adjacent, while a and d are not. If all input letters are adjacent to each other, any code with more than one word has a probability of error greater than zero. In fact, the probability of error satisfies p ~ "j; n o - m p miri where p^ is the smallest among the p ± (j)., n is the length of the code and U is the number of words in the code. To prove this, note that any two words have a possible output word in common, namely the word consisting of the sequence of common output letters when the two input words are compared letter by letter „ Each of the two input words haB a probability at least p^ of producing this common output word In using the code, the two particular input words will each occur j-j of the time and will cause the common output | p^ of the time . This output can be decoded in only one way. Hence at least one of these situations leads to an error . • This error, ~ is assigned to this code word, and from the remaining K -1 code words another pair is chosen., A source of error to the amount I p min 18 * 8 »ig*»d in similar fashion to one of these, and this is a disjoint event Continuing in this manner, we obtain a total cf p n probability of error. * m " It follows that for any rate R greater than zero, (i c e„ U > 2) 4 log P e <logp^ n+ | log 2 ' E ~ lo S Pmln If it is not true that tho input letters are all adjacent to each other, it is possible to transmit at a positive rate with zero probability of error. The least upper bound of all rates which can be achieved with zero probability of error will be called the zero error capacity of the channel and denoted byC o . If we let M o (n) be the largest number of words in a v code of length n, no two of which are adjacent, then C is 1 o the least upper bound of the numbers - log M Q (n) when n varies through all positive integers . An interesting problem which has not been completely Bage 2g solved is that of evaluating C for an arbitrary channel One night expect that C q would be equal to log M (l), that is, that if we choose the largest possible set of non adjacent letters and form all sequences of these of length n. then this would be the best error free code of length n. This is not, in general, true, although it holds in many cases, particularly when the number of input letters is small. The first failure occurs with five input letters with the channel in Fig 2„ In this channel, it is possible to choose at most two independent letters, for example and 2„ Using sequences of these, 00, 02, 20, and 22 we obtain four words in a code of length two e However, it is possible to construct a code of length two with five members no two of which ere adjacent as follows: 00, 12, 2h. 31, U3« It is readily verified that no two of these are adjacent „ Thus, C q for this channel is at least ~ log $ No method has been found for determining C q for the general discrete channel, and this we propose as an important unsolved problem in coding theory. We shall develop a number of results which enable one to determine C q in many special cases, for example, in all channels with five or less inputs with the single exception of the channel of Fig„ 2 (or channels equivalent in adjacency structure to it)„ We will also develop some general inequalities enabling one to estimate C Q quite closely in most cases a It may be seen, in the first place, that the value of C Q depends only on which input letters are adjacent to each other „ Let us define an adjacency matrix for a channel, A, , as follows, ij A i3 1 if input letter i is adjacent to j or if i = j otherwise Suppose two channels have the same adjacency matrix (possibly after renumbering the input letters of one of them„) Then it is obvious that a zero error code for one will be a zero error code for the other and, hence, that the zero error capacity C q for one will also apply to the other „ The adjacency structure contained in the adjacency matrix can also be represented as a linear graph. Construct a graph with as many vertices as there are input symbols, and connect two distinct vertices with a line or branch of the graph if the corresponding input letters are adjacent. Some examples are shown in Fig 3, corresponding to the channels of Fig, 1 and 2, Fage 3g The are a : The zero error capacity C q of a discrete memoryless channel is bounded by the inequalities -log A iJ Vj sc o fi |Jtj) c where C is the capacity of any channel with transition probabilities p^(j) and having the adjacency matrix A . . o The upper bound is fairly obvious . The aero error capacity is certainly less than or equal to the ordinary capacity f cr any channel since the forcer requires codes vrith zero pro~ bability of error vhiSe the latter requires codas approaching zero pro* bability of error. By minimizing the capacity through variation of the P^j) we find the lowest upper bound available through this argucsnt. Since the capacity is a continuous function of the p^(j) in the closed region defined by p± (j) < 1, ^ p.,(j) - I, we may write min instead of greatest lower bound It is worth noting that it is only necessary to consider a particular channel in performing this minimization, although there are an infinite number with the same adjacency matrix. This one particular channel is obtained as follows from the adjacency matrix, If A ±k « 1 for a pair ik, define an output letter j with p^j) and p k (j) both differing from zero. Now if there are any three input letters, say i k 1, all adjacent to each other, define an output letter, say m, with p i (m) p k (m) p 1 (m) all different from zeroo In the graph this corresponds to a complete sub graph with three vertices „ Next subsets of four lettors or complete subgraphs of four vertices, say i k 1 m, are given an output letter, each being con- nected to it, and so on. It is ev; that any channel with the same adjacency matrix differs from that just described only by variation in the number of ou